[GitHub] [arrow-rs] bjchambers commented on pull request #521: Change `nullif` to support arbitrary arrays

GitBox Fri, 17 Sep 2021 09:19:02 -0700


bjchambers commented on pull request #521:
URL: https://github.com/apache/arrow-rs/pull/521#issuecomment-921921254



   > Padding the validity buffer is an interesting approach and avoids many 
edge cases for handling buffers of different data types. The only downside I 
see is that the performance now depends a little on the offset instead of only 
on the length of the slice.
   > 
   > There is a possible alternative solution that would slice the buffers 
(depending on the datatype). I have such an implementation for most data types, 
but since there is separate logic for each type the potential for errors is 
much higher. The not yet implemented types are Struct, Union and 
FixedSizeLists. If there is interest I can post the code or open an alternative 
PR, but I'm not sure it would be a clear improvement.
   
   I'd be happy with either. How does the slicing depend on datatype? It seems 
like supporting the composite types is important to make this work, and the 
errors are a potential concern. On the other hand -- how much do you think the 
performance would depend on the offset? It seems like it may be a little 
sensitive, but shouldn't be significant? If so, it may be better to start with 
something that is less error prone, and then change if performance is a 
concern? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] bjchambers commented on pull request #521: Change `nullif` to support arbitrary arrays

Reply via email to