bjchambers commented on pull request #521: URL: https://github.com/apache/arrow-rs/pull/521#issuecomment-921921254
> Padding the validity buffer is an interesting approach and avoids many edge cases for handling buffers of different data types. The only downside I see is that the performance now depends a little on the offset instead of only on the length of the slice. > > There is a possible alternative solution that would slice the buffers (depending on the datatype). I have such an implementation for most data types, but since there is separate logic for each type the potential for errors is much higher. The not yet implemented types are Struct, Union and FixedSizeLists. If there is interest I can post the code or open an alternative PR, but I'm not sure it would be a clear improvement. I'd be happy with either. How does the slicing depend on datatype? It seems like supporting the composite types is important to make this work, and the errors are a potential concern. On the other hand -- how much do you think the performance would depend on the offset? It seems like it may be a little sensitive, but shouldn't be significant? If so, it may be better to start with something that is less error prone, and then change if performance is a concern? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org