[ https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108158#comment-17108158 ]
WangSheng edited comment on IMPALA-9741 at 5/15/20, 10:53 AM: -------------------------------------------------------------- Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And I have upload a very simple desgin picture as an attachment, but still some questions need to be consider: # If iceberg returns different format files, such as parquet/orc, does backend can handle these files? # if not, we may decide the table data format when create table, maybe by tblproperties, like this: 'iceberg_table_format'='parquet', and if so, we cannot select iceberg table which has different format data files. was (Author: skyyws): Hi [~boroknagyz][~tarmstrong], I have been thinking about how to implement query iceberg by impala recently, and here is my initial desgin. I will write a class named IcebergScanNode.java in frontend, and this class mainly contains these functions: * Transform impala conjunts to iceberg expressions, which means we can pushdown some predicates to icebrg; * Get specific data files from icebreg by these expressions, which stored in hdfs; * Use these specific data files to construct related thrift struct, such as THdfsFileSplit/TScanRangerSpec; * And then backend will.use these thrift structs to construct "SCAN HDFS" to scan data, and this way we can reuse these code in backend. And I have upload a very simple desgin picture as an attachment, but still some questions need to be consider: # If iceberg returns different format files, such as parquet/orc, does backend can handle these files? # if not, we may decide the table data format when create table, maybe by tblproperties, like this: 'iceberg_table'='parquet' > Support query iceberg table by impala > ------------------------------------- > > Key: IMPALA-9741 > URL: https://issues.apache.org/jira/browse/IMPALA-9741 > Project: IMPALA > Issue Type: Sub-task > Reporter: WangSheng > Assignee: WangSheng > Priority: Major > Attachments: select-iceberg.jpg > > > Since we have submit an patch of supporting create iceberg table by impala in > IMPALA-9688, we are preparing to implement iceberg table query by impala. But > we need to read the impala and iceberg code deeply to determine how to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org