boneanxs commented on PR #8076:
URL: https://github.com/apache/hudi/pull/8076#issuecomment-1493558572
> I am yet to review fully, but have taken one pass. Can you break it down
into two PRs - a) don't delete the table location if using SaveMode.Overwrite
for bulk_insert, insert_overwrite, b) add support for bulk_insert for
insert_overwrite and insert_overwrite_table.
Yea, sure, will do so
> Also, I want to understand the use case when we need this. If you can
elaborate a bit more on why we need this, that would be great.
Currently, we want to migrate all existing hive tables to HUDI table, given
many hive tables
1) usually perform `insert_overwrite` operation to overwrite the
partition
2) written by batch jobs, could contains TB level data one day
3) doesn't need to perform the `tag`, `drop duplicates`
`bulk_insert` mode fit such scenario well, we can use `bulk_insert` mode to
boost the write performance and make users easier to migrate existing hive
table to hudi table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]