xushiyan commented on issue #2888:
URL: https://github.com/apache/hudi/issues/2888#issuecomment-947326866


   @PavelPetukhov 
   
   > Note 2: it works fine without --continuous parameter
   Note 3: it stores data as expected with --continuous but fails at some point
   
   you may also want to set `--source-limit` to 1-2G. Or before that, check 
each commit produced under `--continuous` and see if data size ingested 
increasing over time. You could exam commit files under `.hoodie/`. Also check 
the `checkpoint` value in each commit file see if they are advancing over time.
   
   Some other notes: bulkinsert does not equal to upsert. the former does not 
update records. You'd need to choose the right operation based on your business 
need. For parallelism, 50 would not be enough still if you have say 100 
executors; you'd need to adjust it based on your spark job size and num output 
partitions.
   
   Also please try upgrade to 0.9.0.
   
   Above are the suggestions I can gather from the thread above. Closing this 
due to long time inactive. Please follow up here if there is update on this. 
thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to