marton-bod opened a new pull request #2502: URL: https://github.com/apache/iceberg/pull/2502
Using Tez and multi-table inserts, when multiple vertices are spawn to write to the target tables, then we could end up failing the task commit because one of the serialized tables is missing from the task config. Tez works differently than MR when putting together its job config, and so Tez ends up with all target tables in its OUTPUT_TABLES config (target1, ..., targetN), however it would only have a subset of those tables actually serialized into its confi, only those that are relevant to the task. Added a new multi-table insert test case which spawns multiple vertices (previous test only spawned a single vertex) to exercise this codepath, and fixed the issue in the commitTask. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
