alamb commented on issue #19487: URL: https://github.com/apache/datafusion/issues/19487#issuecomment-3696368757
> I think maybe we can use BooleanArray for PruningResult: true->KeepAll, false->SkipAll, null->Unknown, to get the above-mentioned benefits. I think this is a good idea > For complex statistics, we want more flexibility, RecordBatches as IR can be awkward for such cases. For example for sets statistics, it's better to store Vec<HashSet<ScalarValue>> directly, and it's also possible to get extended to other statistics type we haven't thought about so far, like directly store bloom filter as column statistics. The idea of storing sets is an interesting one, though it seems we may be able to represent that with `ListArray` or something similar -- though yes we will need additional information other than a record batch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
