[GitHub] [hudi] vinothchandar commented on issue #1694: Slow Write into Hudi Dataset(MOR)

GitBox Tue, 07 Jul 2020 17:16:21 -0700


vinothchandar commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-655205591



   #1752 is the PR. 
   
   What I am seeing is that the range based pruning is not very effective.. and 
is resulting in lots of shuffled data..  
   
   is there a way to not use global index? i.e can you always determine `ad` 
for each record.?   `.option("hoodie.datasource.write.recordkey.field", 
"wbn")`, is there certain ordering to `wbn` that we can exploit.. I am 
referring some stuff put together here.. 
https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-Whatperformance/ingestlatencycanIexpectforHudiwriting
   
   In general, we need to make the upsert process not be dependent on the size 
of the table, but rather on size of input.. 
   
   If you are open to trying, you can switch to simple index on master, which. 
will be lot lighter in this particular scenario, where there does not seem to 
be any benefits for range/bloom information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] vinothchandar commented on issue #1694: Slow Write into Hudi Dataset(MOR)

Reply via email to