felipecrv commented on PR #41561:
URL: https://github.com/apache/arrow/pull/41561#issuecomment-2109001194
> There seems to be some (unwarranted? gratuitous?) complexity here.
You can comment on what you think is "unwarranted" and "gratuitous"
complexity. This kind of feedback is demotivating and not actionable. I try to
simplify the code before opening a PR instead of pushing the first thing that
works.
> Also, it's not obvious why this would be better than calling Resolve
multiple times in a row.
I tried that in the `Take` kernels, but things got messy and ugly because:
- `Resolve` and `ChunkLocation` use `int64_t` for everything while `Take`
kernels use unsigned integer of all widths. The loops would be full of unsafe
casts with confusing safety justifications.
- `Take` kernels have complex loops handling different cases (the 4
combinations of nullability of the `values` and `indices`), so `Resolve` would
be called (inlined) in 4 branches per kernel loop specialization.
- Ensuring `Resolve` gets called only once per index is tricky because in
the `Take` loops you need to resolve chunks for reading values and validity
bits from multiple functions if you modularize the loop. By having a buffer
with all the chunks resolved at once, code that performs these loads can be
more cleanly separated and accept an index array position as parameter instead
of a `ChunkLocation`
- That makes the code that deals with `ChunkedArray` look very similar to
the code that handles flat arrays — the `ChunkedArray` version needs a
preparation pass to resolve the chunks
- By having this chunk-resolution pass, I can make `Take` loops for
`ChunkedArray` of nested types more easily
In addition to that, `ResolveMany` can be further optimized with different
search strategies (e.g. branchless and interleaved binary searches), the step
that calculates `index_in_chunk` can be vectorized. All of that without
contributing to binary size of all the numerous `Take` specializations that
will rely on it because `ResolveManyImpl` is not inlined.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]