[GitHub] [incubator-hudi] vinothchandar commented on issue #1070: How to do the bulk update .?

GitBox Fri, 06 Dec 2019 10:05:53 -0800

vinothchandar commented on issue #1070: How to do the bulk update .?
URL: https://github.com/apache/incubator-hudi/issues/1070#issuecomment-562678009
 
 
   that helps!  So I would suggest the following concrete steps 
   
   1. Do a first bulk import for the same one month of data.. 
`spark.read.parquet("your_data_set/path/to/month").write.format("org.apache.hudi")`,
 but use the `BULK_INSERT` operatoin as shown here. 
https://hudi.apache.org/quickstart.html#inserts . But seems like you may need 
merge_on_read storage type? 
   
   2. Then you can proceed to upsert the dataset as documented again in 
quickstart.. You can either schedule compaction as a background job or for now 
use CLI to trigger the compactions manually to play around.. @bvaradar this is 
worth FAQing.. do you have links for docs to follow here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1070: How to do the bulk update .?

Reply via email to