tyrelr commented on pull request #8973:
URL: https://github.com/apache/arrow/pull/8973#issuecomment-748753938


   I ran out of time today (and probably the next few days), but just as an 
update... I hit a speedbump looking at removing the .value(...) function.
   
   At a first pass-through at dropping the PrimitiveArray.value() function, I 
hit a few usecases which are not trivially handled by a typed-slice in a 
performant way.
   1) filter kernel does a batched indexing-like operation based on bits being 
set in a u64.
   This can probably be re-arranged to minimize/eliminate bounds checks.
   2) sort & take kernels appear to cherrypick indexes based on another index 
array
   These are tricky.  We may be able to minimize the need bounds checks in some 
way (finding contiguous runs to batch-copy instead of one-by-one? checking max 
index?) but all are adding overhead at a different spot.
   3) csv writing & display try to iterate N columns in lock-step
   This can probably be rewritten with some ahead-of-time bounds checks, 
perhaps relying on some kind of Vec<&dyn Iter<Item=String>> by having each 
column built itself an iterator and map itself to String...  The naive & slow 
approach of editing the current macro to use values()[$i] causes a compilation 
when the macro is used for the BooleanArray type (it still has a values() 
function returning a buffer, like PrimitiveArray used to). I haven't looked at 
whether BooleanArray could also have its API cut down, or if the two should 
just be separated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to