KarthickAN edited a comment on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-710747888
@nsivabalan I tried out Dynamic filter. It seems to be fine. It's growing along with the number of entries dynamically. That's a good feature. Thanks. However what's the recommended approach in terms of indexing here ? I see various features are available out of the box. As per the record size (35 bytes) I could have more than 3.5 Million records in a file with max size 120MB. Since in the doc it was recommended to have approximately half the size of total number of records I went with 1.5M for bloom filter. with index type - hoodie.index.type - How does this SIMPLE type work ? I see hoodie.bloom.index.prune.by.ranges, hoodie.bloom.index.use.caching, hoodie.bloom.index.use.treebased.filter, hoodie.bloom.index.bucketized.checking all these are enabled by default. Does this really help regardless of the hoodie key types used ? In my case I am using ComplexKeyGenerator with five different fields out of which one is timestamp. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
