tustvold commented on issue #3620:
URL: https://github.com/apache/arrow-rs/issues/3620#issuecomment-1591072494

   Thinking about this a bit more, the intention of a selection vector is to 
allow a kernel to skip an expensive computation, such as a string comparison or 
regex evaluation, when **the result is unimportant because we know it is going 
to be discarded**. For some kernels the cost of consulting the selection vector 
will outweigh any savings, especially for kernels like integer comparison where 
it interferes with vectorisation.
   
   Now the potentially interesting observation is the exact same principle also 
holds for null masks, we shouldn't spend time performing expensive evaluation 
on null slots. I think we currently do in some cases, but this should be easy 
to fix.
   
   This then leads to the obvious question, if a false value in a selection 
vector indicates that the result doesn't matter, how would the semantics of an 
operation under a selection vector differ from the semantics of an operation 
with the arrays first passed to 
[`nullif`](https://docs.rs/arrow-select/latest/arrow_select/nullif/fn.nullif.html)
 with the selection vector. As the result is irrelevant, why would its 
null-ness matter?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to