kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898969956


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -127,14 +130,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
           .run(output -> {
             Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
             if (table != null) {
-              HiveIcebergWriter writer = writers.get(output);
+              Collection<DataFile> dataFiles = Lists.newArrayList();
+              Collection<DeleteFile> deleteFiles = Lists.newArrayList();
               String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-                  attemptID.getJobID(), attemptID.getTaskID().getId());
-              if (writer != null) {
-                createFileForCommit(writer.files(), fileForCommitLocation, 
table.io());
-              } else {
+                      attemptID.getJobID(), attemptID.getTaskID().getId());
+              if (writers.get(output) != null) {
+                for (HiveIcebergWriter writer : writers.get(output)) {
+                  if (writer != null) {
+                    dataFiles.addAll(writer.files().dataFiles());

Review Comment:
   I found this usage:
   
https://github.com/apache/hive/blob/67c2d4910ff17c694653eb8bd9c9ed2405cec38b/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergWriterBase.java#L59
   
   `HiveIcebergWriter` does not have `dataFiles()`, `deleteFiles()` methods and 
it can be a `HiveIcebergRecordWriter`, `HiveIcebergDeleteWriter` etc which 
treats data and delete files a different way.
   
   If we want to avoid creating the `FilesForCommit` object creation to replace 
`HiveIcebergWriter.files()`
   * create a method like `HiveIcebergWriter.collectFiles(List<> dataFiles, 
List<> deleteFiles)`
   or 
   * create dataFiles(), deleteFiles() methods. I prefer wrapping the returned 
lists into unmodifiableList which is also a new object creation.
   
   Which do you prefer? 
   
   On the other hand I don't think creating the `FilesForCommit` objects is 
critical. These are created only when a result of a statement should be 
committed/aborted not per record basis



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to