[GitHub] [hive] kasakrisz opened a new pull request, #3362: HIVE-26319: Iceberg integration: Perform update split early

GitBox Mon, 13 Jun 2022 04:46:59 -0700


kasakrisz opened a new pull request, #3362:
URL: https://github.com/apache/hive/pull/3362


   ### What changes were proposed in this pull request?
   Rewrite update statements of iceberg tables to multi insert statement 
similarly in case of native acid tables.
   
   When generating the rewritten statement:
   * Get the virtual columns from the table's storage handler in case of non 
native acid tables
   * Include the old values to the select clause of the delete branch of the 
multi insert statement.
   
   When executing the multi insert:
   * Two iceberg writers are used which produce a data delta file and a delete 
delta file. The result of these writers should be merged into one 
`FilesForCommit` if both writers are run in the same task.
   * In case of more complex statements (ex. partitioned and/or bucketed) more 
than one Tez task produces commit info so this patch enables storing all of 
them.
   * Every `FileSinkOperator` creates its own jobConf instance because the 
iceberg write operation is stored in it and it is different in both instance.
   
   
   ### Why are the changes needed?
   See #2855
   + Preparation for iceberg Merge implementation.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestIcebergLlapLocalCliDriver -Dqfile=update_iceberg_partitioned_orc2.q 
-pl itests/qtest-iceberg -Piceberg -Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] kasakrisz opened a new pull request, #3362: HIVE-26319: Iceberg integration: Perform update split early

Reply via email to