[
https://issues.apache.org/jira/browse/FLINK-19595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-19595:
-----------------------------------
Labels: auto-deprioritized-major stale-minor (was:
auto-deprioritized-major)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Minor but is unassigned and neither itself nor its Sub-Tasks have been updated
for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is
still Minor, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Flink SQL support S3 select
> ---------------------------
>
> Key: FLINK-19595
> URL: https://issues.apache.org/jira/browse/FLINK-19595
> Project: Flink
> Issue Type: Improvement
> Components: FileSystems, Table SQL / Ecosystem
> Reporter: liuxiaolong
> Priority: Minor
> Labels: auto-deprioritized-major, stale-minor
> Attachments: image-2020-11-02-18-08-11-461.png,
> image-2020-11-02-18-18-14-961.png
>
>
> h4. Summarize
> Flink is based on S3AInputStream.java to select datas stored in Tencent COS,
> it will call the getObject function of AmazonS3Client.java.
> Now, Tencent COS have already support to pushdown the CSV and Parquert file
> format.
> In these cases, using getObject to select datas will wastes a lots of
> bandwidth.
> So, I think Flink SQL should support S3 Select, to reduce the waste of
> bandwidth.
>
> h4. Design
> 1. In HiveMapredSplitReader.java , we used int[] selectedFields to construct
> S3 SELECT SQL. And we created a new Class named S3SelectCsvReader which used
> AmazonS3Client.selectObjectContent function to readLine CSV File.
> !image-2020-11-02-18-08-11-461.png|width=535,height=967!
>
> !image-2020-11-02-18-18-14-961.png|width=629,height=284!
>
> 2. Flink Demo Table:
> 1) Table schema
> Flink SQL> desc cos.test_s3a;
> root
> |– name: STRING (col1)|
> |– age: INT (col2)|
> |– dt: STRING (col3,it's a partition column)|
>
> 2) Conversion relationship (FLINK SQL Convert To S3 SELECT SQL)
> FlinkSQL
> S3 SELECT SQL
> select name from cos.test_s3a; =>
> SELECT s._1, null FROM S3Object s
> select age from cos.test_s3a;
> => SELECT null, s._2 FROM S3Object s
> select dt, name, age from cos.test_s3a; =>
> SELECT s._1, s._2 FROM S3Object s
> select dt from cos.test_s3a;
> => SELECT null, null FROM S3Object s
> select * from cos.test_s3a;
> => SELECT s._1, s._2 FROM S3Object s
> select name from cos.test_s3a where dt='2020-07-15'; => SELECT
> s._1, null FROM S3Object s
>
> 3) Patch Commit
> https://github.com/Coderlxl/flink/commit/b211f4830a7301bf9283a6d37209000b176913ad
--
This message was sent by Atlassian Jira
(v8.3.4#803005)