boneanxs commented on PR #8076:
URL: https://github.com/apache/hudi/pull/8076#issuecomment-1493558572

   >  I am yet to review fully, but have taken one pass. Can you break it down 
into two PRs - a) don't delete the table location if using SaveMode.Overwrite 
for bulk_insert, insert_overwrite, b) add support for bulk_insert for 
insert_overwrite and insert_overwrite_table.
   
   Yea, sure, will do so
   
   > Also, I want to understand the use case when we need this. If you can 
elaborate a bit more on why we need this, that would be great.
   
   Currently, we want to migrate all existing hive tables to HUDI table, given 
many hive tables
      1) usually perform `insert_overwrite` operation to overwrite the 
partition 
      2) written by batch jobs, could contains TB level data one day 
      3) doesn't need to perform the `tag`, `drop duplicates`
   
   `bulk_insert` mode fit such scenario well, we can use `bulk_insert` mode to 
boost the write performance and make users easier to migrate existing hive 
table to hudi table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to