Re: [I] [C++] Request to move ChunkResolver to public API [arrow]

via GitHub Thu, 29 Feb 2024 18:40:17 -0800


felipecrv commented on issue #34535:
URL: https://github.com/apache/arrow/issues/34535#issuecomment-1972361985


   > So just to clarify, is the plan to move both ChunkResolver and 
ChunkedArrayResolver into the public namespace?
   
   Not my plan, but after you mentioned you wanted to unify `ChunkResolver` 
with `ChunkedArrayResolver` I took a look on how it's designed.
   
   If it were to go public, that would be done in another issue/PR pair, so for 
now it's better to focus just on making `ChunkResolver` public.
   
   About the code:
   
   ```cpp
   for (int i = 0; i < table->num_rows(); ++i)
   {
     std::int64_t a = resolver_a.Resolve(i).Value<arrow::Int64Type>();
     std::string_view b = resolver_b.Resolve(i).Value<arrow::StringType>();
     do_business_logic(a, b);
   }
   ```
   
   It's important to delay the re-construction of row-by-row values as much as 
possible to preserve the benefits of columnar layouts. So there is a danger in 
exposing too many APIs that work value-by-value instead or array-by-array. 
Array-by-array is not compatible with how most people think about programming.
   
   The loop above is performing sequential access, so using the chunk resolver 
is not the best solution regarding locality of the memory accesses. [1] It's 
better to keep one `ChunkLocation` per column and increment them all at once. 
   
   
   [1] `ChunkResolver` and `ChunkedArrayResolver` are used internally in Arrow 
mainly on functions that perform random access: `Take`, `Sort`, `Rank`...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [C++] Request to move ChunkResolver to public API [arrow]

Reply via email to