pepijnve commented on PR #18152:
URL: https://github.com/apache/datafusion/pull/18152#issuecomment-3449904679

   > Your solution **assume** that case expression evaluation are cheaper than 
copy record batch, right?.
   > ... do you **evaluate** a > 1 for the remaining 90% or 100%?
   
   @rluvaton I was a bit stumped by this feedback at first. Rereading this 
morning and taking your emphasis into account, I was wondering if the use of 
`evaluate` rather than `evaluate_selection` is causing incorrect conclusions. 
It's correct that in this PR I chose to use `evaluate`, but that doesn't mean 
the expressions are not evaluated selectively. Instead the filtering that's 
otherwise done by `evaluate_selection` is pulled in to the case evaluation 
loop. The 'remaining' record batch that's passed to `evaluate` shrinks as we go 
through the loop.
   
   In your example, if we start with 100 rows input, and 10% match the first 
when predicate, then remaining will be the other 90 rows in the next loop 
iteration. Then 'then' expression is only evaluated for 10 rows.
   
   The filtering is pulled into the loop because I want to reuse the computed 
and optimised  `FilterPredicate` to also filter the row number array. This is 
required in order to be able to map the partial/selective results back to their 
original rows. The code in `main` achieves this correlation using a scatter 
operation based on the original selection vector that maps the partial result 
back to an array with the same length as the original input. In the same 
example, 100 rows get filtered down to 10 rows, those 10 rows are evaluated to 
an array of 10 values, and that array is scattered back to an array of 100 
values with nulls inserted where necessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to