alamb commented on issue #19487:
URL: https://github.com/apache/datafusion/issues/19487#issuecomment-3696368757

   > I think maybe we can use BooleanArray for PruningResult: true->KeepAll, 
false->SkipAll, null->Unknown, to get the above-mentioned benefits.
   
   
   I think this is a good idea
   
   > For complex statistics, we want more flexibility, RecordBatches as IR can 
be awkward for such cases. For example for sets statistics, it's better to 
store Vec<HashSet<ScalarValue>> directly, and it's also possible to get 
extended to other statistics type we haven't thought about so far, like 
directly store bloom filter as column statistics.
   
   The idea of storing sets is an interesting one, though it seems we may be 
able to represent that with `ListArray` or something similar -- though yes we 
will need additional information other than a record batch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to