Reimus commented on issue #5808: URL: https://github.com/apache/hudi/issues/5808#issuecomment-1152845213
Thank you for the explanation. The column stats indexes / data skipping are awesome addition to 0.11.0 already - given that in docs they are mentioned in same breath as bloom index - I assumed there is a use for bloom based secondary indexes too - think customer uild column for example - since it is a random string, column stats would be relatively useless - but bloom filter could skip 99% of all files when looking for a particular uuid. Or am I missing on how the column stats work - reading the code/metadata - they seem useful for monotonic or slowly changing columns - like dates or db FK's - where min/max stats in combination of clustering/sorting can do proper data skipping. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
