[GitHub] [hudi] spyzzz commented on issue #2193: [SUPPORT] Optimise MOR file size and upsert

GitBox Mon, 26 Oct 2020 13:49:43 -0700


spyzzz commented on issue #2193:
URL: https://github.com/apache/hudi/issues/2193#issuecomment-716814642



   Yes you're right, the metadata i gave you was from the initial boostraps. 
   
   So yes, i've two differents workloads, you sum'up it pretty cleary. Except 
that, for the (1) it might be possible to get update in the initial bootstrap 
(its wasnt the case for the files i gave you) but that's why i can't use 
BULK_INSERT mode. 
   With the fews configuration tips you gave me, i'm able to keep pretty linear 
time for each 5Millions rows batch so yes its better.
   I'm around 500K rows/min (i don't really know if its correct or not) with a 
6 partitions kafka topics.
   
   For the (2), in CDC mode yes, i've a lot less rows, but i've to do update 
pretty often (let says every 10min), I tried this afternoon, and i'm able to 
handle a 10min micro batch in 2min or so. (and the hudi output stay pretty 
acceptable, not too many files, every 10 delta file, hudi create a -40/70Mb 
parquet file 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] spyzzz commented on issue #2193: [SUPPORT] Optimise MOR file size and upsert

Reply via email to