[ 
https://issues.apache.org/jira/browse/IMPALA-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-11339:
---------------------------------------
    Description: 
Currently Impala doesn't support LOAD DATA statements for Iceberg tables.

Some user workflows still use this statement, so it would be nice to implement 
it in some way.

The parameter to LOAD DATA can be a directory or a single file.

A possible solution would be to
 # Create an external table
 ## If the parameter is a single file, then we can use IMPALA-10934 to define 
an external table on this single file
 ## If the parameter is a directory, then we need to create an external table 
using the directory as table location. To get the table schema we could use 
CREATE TABLE LIKE PARQUET/ORC
 # run an {{insert into iceberg_table select * from tmp_table}}
 # drop the tmp table (not sure if we want to keep or remove the original files)

It does some copying, but probably this would be the safest solution.

Users might specify the partition columns in the [PARTITION (partcol1=val1, 
partcol2=val2 ...)] clause. In this case the data files don't necessarily 
contain the partition values, i.e. we need to create the tmp table with proper 
partitioning.

  was:
Currently Impala doesn't support LOAD DATA statements for Iceberg tables.

Some user workflows still use this statement, so it would be nice to implement 
it in some way.

A possible solution would be to
 # create a temp table on those sets of files with the right schema
 # run a {{insert into iceberg table select * from tmp table}}
 # drop the tmp table and delete the files in the staging directory

It does some copying, but probably this would be the safest solution.

Users might specify the partition columns in the [PARTITION (partcol1=val1, 
partcol2=val2 ...)] clause. In this case the data files don't necessarily 
contain the partition values, i.e. we need to create the tmp table with proper 
partitioning.


> Implement LOAD DATA INPATH for Iceberg tables
> ---------------------------------------------
>
>                 Key: IMPALA-11339
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11339
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: LiPenglin
>            Priority: Major
>              Labels: impala-iceberg
>
> Currently Impala doesn't support LOAD DATA statements for Iceberg tables.
> Some user workflows still use this statement, so it would be nice to 
> implement it in some way.
> The parameter to LOAD DATA can be a directory or a single file.
> A possible solution would be to
>  # Create an external table
>  ## If the parameter is a single file, then we can use IMPALA-10934 to define 
> an external table on this single file
>  ## If the parameter is a directory, then we need to create an external table 
> using the directory as table location. To get the table schema we could use 
> CREATE TABLE LIKE PARQUET/ORC
>  # run an {{insert into iceberg_table select * from tmp_table}}
>  # drop the tmp table (not sure if we want to keep or remove the original 
> files)
> It does some copying, but probably this would be the safest solution.
> Users might specify the partition columns in the [PARTITION (partcol1=val1, 
> partcol2=val2 ...)] clause. In this case the data files don't necessarily 
> contain the partition values, i.e. we need to create the tmp table with 
> proper partitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to