adriangb commented on PR #12978: URL: https://github.com/apache/datafusion/pull/12978#issuecomment-2420764628
@alamb can these stats be truncated? I know stats in pages truncate large strings, e.g. if the min value is `"B"` could it be that the actual min value is `"BA"`? If so I think this approach may not work at all. Imagine we have a row group with data `["BA", "ZD"]` which generates min/max stats `["B", "Z"]`. Now we want to know if `col LIKE '%A%'` is possible. Clearly the answer should be *yes* but if we convert it to the predicate form we get `'B' <= '' AND '' <= 'Z'` which gives `false` 😞 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
