marton-bod opened a new pull request #2163: URL: https://github.com/apache/iceberg/pull/2163
In order to enable Hive writes using the Tez engine, we have to make a few modifications to the OutputCommitter due to the inner workings of Tez. Couple of main reasons for the changes: 1. Tez: Arbitrary inclusion/exclusion of vertexId in TaskAttemptID: there's a difference in Tez on how TaskAttemptIDs are constructed - in some places the vertexId is included in the TaskAttemptID, in other places it's not, which leads to differences in the IDs and therefore to issues when retrieving the Writer from the cache based on this ID. 2. Tez: taskType (reducer/mapper) being part of TaskAttemptID's equals/hashcode: this prevents a reducer from retrieving a Writer object that was previously cached by a mapper for example. Enabling the unit tests to run on Tez will be done in a future PR. For that work, we'll need to release a new version of Hive and Tez containing the necessary patches (mainly HIVE-24629 and TEZ-4264) and update the dependencies here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
