tyrelr commented on pull request #8973: URL: https://github.com/apache/arrow/pull/8973#issuecomment-748753938
I ran out of time today (and probably the next few days), but just as an update... I hit a speedbump looking at removing the .value(...) function. At a first pass-through at dropping the PrimitiveArray.value() function, I hit a few usecases which are not trivially handled by a typed-slice in a performant way. 1) filter kernel does a batched indexing-like operation based on bits being set in a u64. This can probably be re-arranged to minimize/eliminate bounds checks. 2) sort & take kernels appear to cherrypick indexes based on another index array These are tricky. We may be able to minimize the need bounds checks in some way (finding contiguous runs to batch-copy instead of one-by-one? checking max index?) but all are adding overhead at a different spot. 3) csv writing & display try to iterate N columns in lock-step This can probably be rewritten with some ahead-of-time bounds checks, perhaps relying on some kind of Vec<&dyn Iter<Item=String>> by having each column built itself an iterator and map itself to String... The naive & slow approach of editing the current macro to use values()[$i] causes a compilation when the macro is used for the BooleanArray type (it still has a values() function returning a buffer, like PrimitiveArray used to). I haven't looked at whether BooleanArray could also have its API cut down, or if the two should just be separated. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
