[ https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585741 ]
ASF GitHub Bot logged work on HIVE-25006: ----------------------------------------- Author: ASF GitHub Bot Created on: 20/Apr/21 11:52 Start Date: 20/Apr/21 11:52 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2161: URL: https://github.com/apache/hive/pull/2161#discussion_r616608877 ########## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java ########## @@ -105,13 +105,18 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { .executeWith(tableExecutor) .run(output -> { Table table = HiveIcebergStorageHandler.table(context.getJobConf(), output); - HiveIcebergRecordWriter writer = writers.get(output); - DataFile[] closedFiles = writer != null ? writer.dataFiles() : new DataFile[0]; - String fileForCommitLocation = generateFileForCommitLocation(table.location(), jobConf, - attemptID.getJobID(), attemptID.getTaskID().getId()); - - // Creating the file containing the data files generated by this task for this table - createFileForCommit(closedFiles, fileForCommitLocation, table.io()); + if (table != null) { Review comment: This happens during task commit, so before the commitInsert hook is called. The essential problem here is that `OUTPUT_TABLES` contains all the tables, however, only those tables are serialized into the jobconfig that are relevant for the given task. So it tries to iterate over 1...N tables (based on `OUTPUT_TABLES`), but only has access to serialized Table 1 (hence the if). The whole parallel commit logic for multitable inserts on both the task commit and job commit side are broken I think, if there is more than one vertex writing to target tables. Currently the tests pass because it creates a single writer vertex, which will have both tables serialized into its config. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 585741) Time Spent: 1h 50m (was: 1h 40m) > Commit Iceberg writes in HiveMetaHook instead of TezAM > ------------------------------------------------------ > > Key: HIVE-25006 > URL: https://issues.apache.org/jira/browse/HIVE-25006 > Project: Hive > Issue Type: Task > Reporter: Marton Bod > Assignee: Marton Bod > Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. > This will enable us to implement insert overwrites for iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)