KarthickAN commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-711645166
@nsivabalan @vinothchandar Thank you so much for all the explanations. If I think about it, having 10MB worth of index data may not be an issue as long as the file contains considerable amount of records. In my case there was a scenario where I had only 1000 records but with 10MB for index. So I switched to dynamic bloom now which is really helpful in this case. We are dealing with two different types of data out of which one doesn't have much volume. That's where it threw it off where as for the other type where we do have good volume of data this didn't come out as an issue as we'd already have around 110-120MB worth of data plus index. As of now I've configure it like below IndexBloomNumEntries = 35000 BloomIndexFilterType = DYNAMIC_V0 BloomIndexFilterDynamicMaxEntries = 1400000 starting off with 35k (1% of max no of entries in a file) as a base and scaling it out till 1.4M(40% of max no of entries in a file) entries as the file grows. So that should solve the problem possibly. Anyways we need to test this out for the volume we are seeing right now and tune it further if required. @vinothchandar Yes. Having a blog around this will definitely be very helpful. I felt hudi has a lot of features that can be used efficiently with some more in depth explanations than what we have right now as part of the documentation. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org