Re: [PR] GH-41560: [C++] ChunkResolver: Implement ResolveMany and add unit tests [arrow]

via GitHub Mon, 13 May 2024 16:37:14 -0700


felipecrv commented on PR #41561:
URL: https://github.com/apache/arrow/pull/41561#issuecomment-2109001194


   > There seems to be some (unwarranted? gratuitous?) complexity here.
   
   You can comment on what you think is "unwarranted" and "gratuitous" 
complexity. This kind of feedback is demotivating and not actionable. I try to 
simplify the code before opening a PR instead of pushing the first thing that 
works.
   
   > Also, it's not obvious why this would be better than calling Resolve 
multiple times in a row.
   
   I tried that in the `Take` kernels, but things got messy and ugly because:
   
    - `Resolve` and `ChunkLocation` use `int64_t` for everything while `Take` 
kernels use unsigned integer of all widths. The loops would be full of unsafe 
casts with confusing safety justifications.
    - `Take` kernels have complex loops handling different cases (the 4 
combinations of nullability of the `values` and `indices`), so `Resolve` would 
be called (inlined) in 4 branches per kernel loop specialization.
    - Ensuring `Resolve` gets called only once per index is tricky because in 
the `Take` loops you need to resolve chunks for reading values and validity 
bits from multiple functions if you modularize the loop. By having a buffer 
with all the chunks resolved at once, code that performs these loads can be 
more cleanly separated and accept an index array position as parameter instead 
of a `ChunkLocation`
    - That makes the code that deals with `ChunkedArray` look very similar to 
the code that handles flat arrays — the `ChunkedArray` version needs a 
preparation pass to resolve the chunks
    - By having this chunk-resolution pass, I can make `Take` loops for 
`ChunkedArray` of nested types more easily
    
   In addition to that, `ResolveMany` can be further optimized with different 
search strategies (e.g. branchless and interleaved binary searches), the step 
that calculates `index_in_chunk` can be vectorized. All of that without 
contributing to binary size of all the numerous `Take` specializations that 
will rely on it because `ResolveManyImpl` is not inlined.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41560: [C++] ChunkResolver: Implement ResolveMany and add unit tests [arrow]

Reply via email to