felipecrv commented on issue #34535:
URL: https://github.com/apache/arrow/issues/34535#issuecomment-1977858863

   ```cpp
   for (ChunkLocation loc; resolver.Valid(loc); loc = resolved.Next(loc)) {
       // what is the most efficient way to access the values for each column 
here?
       
       // I'm not sure this is the most efficient, but it's certainly better 
than using ChunkedArrayResolver for every array.
       // don't forget the null checks (missing here)
       int64_t a =  
checked_cast<Int64Array>(chunks[loc.chunk_index]).Value(loc.index_in_chunk);
       std::string_view b = 
checked_cast<StringArray>(chunks[loc.chunk_index]).Value(loc.index_in_chunk);
       process_row_oh_no(a, b);
   }
   ```
   
   > I've been thinking about this and came up with an alternative way that may 
be better, and would be interested to hear your thoughts on it (pseudocode):
   
   Your approach works even better and doesn't need the `ChunkResolver`. What 
I've been trying to say is that you should never need the binary-search that 
`ChunkResolver` (and `ChunkedArrayResolver`) provides if you're looping 
SEQUENTIALLY over chunked arrays.
   
   > We can also document that these classes should never be used for pure 
sequential access, and something like above should be preferred instead. How 
does that sound?
   
   Well... documenting won't deter people from using it. See how much we had to 
discuss here so this nuance could be understood. The better plan is to 
implement things that need random access inside Arrow (sort, ranking, take, 
filter, joins...) and have users use that instead of reach to random access 
directly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to