kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r898969956
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -127,14 +130,23 @@ public void commitTask(TaskAttemptContext
originalContext) throws IOException {
.run(output -> {
Table table =
HiveIcebergStorageHandler.table(context.getJobConf(), output);
if (table != null) {
- HiveIcebergWriter writer = writers.get(output);
+ Collection<DataFile> dataFiles = Lists.newArrayList();
+ Collection<DeleteFile> deleteFiles = Lists.newArrayList();
String fileForCommitLocation =
generateFileForCommitLocation(table.location(), jobConf,
- attemptID.getJobID(), attemptID.getTaskID().getId());
- if (writer != null) {
- createFileForCommit(writer.files(), fileForCommitLocation,
table.io());
- } else {
+ attemptID.getJobID(), attemptID.getTaskID().getId());
+ if (writers.get(output) != null) {
+ for (HiveIcebergWriter writer : writers.get(output)) {
+ if (writer != null) {
+ dataFiles.addAll(writer.files().dataFiles());
Review Comment:
I found this usage:
https://github.com/apache/hive/blob/67c2d4910ff17c694653eb8bd9c9ed2405cec38b/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergWriterBase.java#L59
`HiveIcebergWriter` does not have `dataFiles()`, `deleteFiles()` methods and
it can be a `HiveIcebergRecordWriter`, `HiveIcebergDeleteWriter` etc which
treats data and delete files a different way.
If we want to avoid creating the `FilesForCommit` object creation to replace
`HiveIcebergWriter.files()`
* create a method like `HiveIcebergWriter.collectFiles(List<> dataFiles,
List<> deleteFiles)`
or
* create dataFiles(), deleteFiles() methods. I prefer wrapping the returned
lists into unmodifiableList which is also a new object creation.
Which do you prefer?
On the other hand I don't think creating the `FilesForCommit` objects is
critical. These are created only when a result of a statement should be
committed/aborted not per record basis
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]