[GitHub] [hudi] Reimus commented on issue #5808: [SUPPORT] Data skipping using Column Stats Bloom does not seem to work at all

GitBox Fri, 10 Jun 2022 20:25:16 -0700


Reimus commented on issue #5808:
URL: https://github.com/apache/hudi/issues/5808#issuecomment-1152845213


   Thank you for the explanation.
   
   The column stats indexes / data skipping are awesome addition to 0.11.0 
already - given that in docs they are mentioned in same breath as bloom index - 
I assumed there is a use for bloom based secondary indexes too 
   - think customer uild column for example - since it is a random string, 
column stats would be relatively useless - but bloom filter could skip 99% of 
all files when looking for a particular uuid.
   Or am I missing on how the column stats work - reading the code/metadata - 
they seem useful for monotonic or slowly changing columns - like dates or db 
FK's - where min/max stats in combination of clustering/sorting can do proper 
data skipping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Reimus commented on issue #5808: [SUPPORT] Data skipping using Column Stats Bloom does not seem to work at all

Reply via email to