marton-bod opened a new pull request #2163:
URL: https://github.com/apache/iceberg/pull/2163


   In order to enable Hive writes using the Tez engine, we have to make a few 
modifications to the OutputCommitter due to the inner workings of Tez. Couple 
of main reasons for the changes:
   1. Tez: Arbitrary inclusion/exclusion of vertexId in TaskAttemptID: there's 
a difference in Tez on how TaskAttemptIDs are constructed - in some places the 
vertexId is included in the TaskAttemptID, in other places it's not, which 
leads to differences in the IDs and therefore to issues when retrieving the 
Writer from the cache based on this ID.
   2. Tez: taskType (reducer/mapper) being part of TaskAttemptID's 
equals/hashcode: this prevents a reducer from retrieving a Writer object that 
was previously cached by a mapper for example.
   
   Enabling the unit tests to run on Tez will be done in a future PR. For that 
work, we'll need to release a new version of Hive and Tez containing the 
necessary patches (mainly HIVE-24629 and TEZ-4264) and update the dependencies 
here.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to