[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108158#comment-17108158
 ] 

WangSheng edited comment on IMPALA-9741 at 5/15/20, 10:53 AM:
--------------------------------------------------------------

Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And I have upload a very simple desgin picture as an attachment, but still some 
questions need to be consider:
# If iceberg returns different format files, such as parquet/orc,  does backend 
can handle these files?
# if not, we may decide the table data format when create table, maybe by 
tblproperties, like this: 'iceberg_table_format'='parquet', and if so, we 
cannot select iceberg table which has different format data files.



was (Author: skyyws):
Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement 
query iceberg by impala recently, and here is my initial desgin. I will write a 
class named IcebergScanNode.java in frontend, and this class mainly contains 
these functions:
* Transform impala conjunts to iceberg expressions, which means we can pushdown 
some predicates to icebrg;
* Get specific data files from icebreg by these expressions, which stored in 
hdfs;
* Use these specific data files to construct related thrift struct, such as 
THdfsFileSplit/TScanRangerSpec;
* And then backend will.use these thrift structs to construct "SCAN HDFS" to 
scan data, and this way we can reuse these code in backend.

And I have upload a very simple desgin picture as an attachment, but still some 
questions need to be consider:
# If iceberg returns different format files, such as parquet/orc,  does backend 
can handle these files?
# if not, we may decide the table data format when create table, maybe by 
tblproperties, like this: 'iceberg_table'='parquet'


> Support query iceberg table by impala
> -------------------------------------
>
>                 Key: IMPALA-9741
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9741
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: WangSheng
>            Assignee: WangSheng
>            Priority: Major
>         Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to