pvary opened a new pull request #2228:
URL: https://github.com/apache/iceberg/pull/2228
When the insert statement adds data to multiple tables the current
OutputCommitter implementation fails to handle the situation.
Example query:
```
FROM customers
INSERT INTO target1 SELECT *
INSERT INTO target2 SELECT *
```
The change contains the following modifications:
- We have to handle multiple writers for a single `TaskAttempt`.
- Since we almost always start with a location we can use that as a key.
- When committing the task we have to create a `forCommit` file for every
target table
- For this we have to collect the target table locations and names when
creating the jobConf
- When committing the job we have to commit all of the tables
- When aborting the job we have to clean up every target table directory
- When cleaning up after a job we have to do it for every table
- Added a test for multi table insert
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]