westonpace opened a new pull request, #13172: URL: https://github.com/apache/arrow/pull/13172
### This PR is in a draft state because it depends on #12289 which is not yet merged. I would not recommend reviewing it too closely yet. This PR is the second PR in a series to merge in #12326. The first was #12872. The new functionality can be found in 7daafeb02906b59c153127463a0c6bed4c888026 and is currently named `RowTable` and `RowTableMerge`. In addition this PR moves around, and renames, a lot of existing work revolving around `KeyRowArray`. The primary goal of the refactor was to improve the readability and clarity of the code base. I did not make any functional changes to the code and if any functional changes are suggested which modify existing code I will happily discuss them here but defer the changes themselves to follow-up PRs. I would very much appreciate any feedback on naming, making sure we have sufficient test coverage, and overall layout of the code. It's a very large change but most of it is moving things around. What I'd like input on most: * KeyRowArray -> RowTableImpl, RowArray -> RowTable: This is the most meaningful name change. The old name made sense because this data is currently represented physically as an array of rows. However, the data is conceptually tabular. We are storing rows & columns. In particular, I found it confusing that KeyColumnArray was a 1D data structure while KeyRowArray was a 2D table structure. * Overall structure: I created a new folder `arrow/compute/row` and put all row-based utilities in here. Most of the files are now marked as `_internal` and the content in these files is not used outside of `arrow/compute/row`. The grouper had previously been alongside the kernel code and it didn't really belong there. It also relies very heavily on the internal structure of the row encoding. * Row structure: I documented the file `arrow/compute/row/row_internal.h` and would appreciate review of the content here. * Semi-external API: The "external" API now consists of `arrow/compute/row/grouper.h`, which is more or less unchanged, and `arrow/compute/encode.h` which is mostly new code from 7daafeb02906b59c153127463a0c6bed4c888026. We have quite a few tests exercising `Grouper` and I was unable to easily extract them from the aggregate kernel tests so I left the tests alone. I added tests to the new utilities in `encode.h`. These utilities are not yet properly external because they depend on `RowTableImpl` and I would need to migrate them to the PIMPL pattern to remove this dependency. Since that would be a functional change I think it would be best to wait. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org