adriangb commented on issue #20324:
URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3917416683

   > 15 of the regressing ClickBench queries (Q10-Q22, Q25, Q27) filter on a 
column that is also in the `SELECT` projection. When all filter columns are 
already projected, the RowFilter provides no I/O savings, those columns must be 
decoded regardless. The overhead is pure loss.
   
   Is this true if there are more than 1 column selected and the filter is very 
selective? E.g. `select id, long_message from t where id = 123` and 
long_message like '%foo%'`. If we push `id` down as a row filter we can avoid 
99% of the decode (we only have to decode 1 row / page / minimum unit of 
`long_message`. IMO in a case like this the ideal would be to evaluate `id = 
123` as a row filter and then `long_message like '%foo%'` as a remainder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to