[
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782016
]
ASF GitHub Bot logged work on HIVE-26319:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 16/Jun/22 11:11
Start Date: 16/Jun/22 11:11
Worklog Time Spent: 10m
Work Description: kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898969956
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -127,14 +130,23 @@ public void commitTask(TaskAttemptContext
originalContext) throws IOException {
.run(output -> {
Table table =
HiveIcebergStorageHandler.table(context.getJobConf(), output);
if (table != null) {
- HiveIcebergWriter writer = writers.get(output);
+ Collection<DataFile> dataFiles = Lists.newArrayList();
+ Collection<DeleteFile> deleteFiles = Lists.newArrayList();
String fileForCommitLocation =
generateFileForCommitLocation(table.location(), jobConf,
- attemptID.getJobID(), attemptID.getTaskID().getId());
- if (writer != null) {
- createFileForCommit(writer.files(), fileForCommitLocation,
table.io());
- } else {
+ attemptID.getJobID(), attemptID.getTaskID().getId());
+ if (writers.get(output) != null) {
+ for (HiveIcebergWriter writer : writers.get(output)) {
+ if (writer != null) {
+ dataFiles.addAll(writer.files().dataFiles());
Review Comment:
I found this usage:
https://github.com/apache/hive/blob/67c2d4910ff17c694653eb8bd9c9ed2405cec38b/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergWriterBase.java#L59
`HiveIcebergWriter` does not have `dataFiles()`, `deleteFiles()` methods and
it can be a `HiveIcebergRecordWriter`, `HiveIcebergDeleteWriter` etc which
treats data and delete files a different way.
If we want to avoid creating the `FilesForCommit` object creation to replace
`HiveIcebergWriter.files()`
* create a method like `HiveIcebergWriter.collectFiles(List<> dataFiles,
List<> deleteFiles)`
or
* create dataFiles(), deleteFiles() methods. I prefer wrapping the returned
lists into unmodifiableList which is also a new object creation.
Which do you prefer?
On the other hand I don't think creating the `FilesForCommit` objects is
critical. These are created only when a result of a statement should be
committed/aborted not per record basis
Issue Time Tracking
-------------------
Worklog Id: (was: 782016)
Time Spent: 1h 10m (was: 1h)
> Iceberg integration: Perform update split early
> -----------------------------------------------
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
> Issue Type: Improvement
> Components: File Formats
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native
> acid tables
--
This message was sent by Atlassian Jira
(v8.20.7#820007)