vinothchandar commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-655205591
#1752 is the PR.
What I am seeing is that the range based pruning is not very effective.. and
is resulting in lots of shuffled data..
is there a way to not use global index? i.e can you always determine `ad`
for each record.? `.option("hoodie.datasource.write.recordkey.field",
"wbn")`, is there certain ordering to `wbn` that we can exploit.. I am
referring some stuff put together here..
https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-Whatperformance/ingestlatencycanIexpectforHudiwriting
In general, we need to make the upsert process not be dependent on the size
of the table, but rather on size of input..
If you are open to trying, you can switch to simple index on master, which.
will be lot lighter in this particular scenario, where there does not seem to
be any benefits for range/bloom information.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]