westonpace opened a new pull request, #13172:
URL: https://github.com/apache/arrow/pull/13172

   ### This PR is in a draft state because it depends on #12289 which is not 
yet merged.  I would not recommend reviewing it too closely yet.
   
   This PR is the second PR in a series to merge in #12326.  The first was 
#12872.  The new functionality can be found in 
7daafeb02906b59c153127463a0c6bed4c888026 and is currently named `RowTable` and 
`RowTableMerge`.  In addition this PR moves around, and renames, a lot of 
existing work revolving around `KeyRowArray`.
   
   The primary goal of the refactor was to improve the readability and clarity 
of the code base.  I did not make any functional changes to the code and if any 
functional changes are suggested which modify existing code I will happily 
discuss them here but defer the changes themselves to follow-up PRs.  I would 
very much appreciate any feedback on naming, making sure we have sufficient 
test coverage, and overall layout of the code.
   
   It's a very large change but most of it is moving things around.  What I'd 
like input on most:
   
    * KeyRowArray -> RowTableImpl, RowArray -> RowTable: This is the most 
meaningful name change.  The old name made sense because this data is currently 
represented physically as an array of rows.  However, the data is conceptually 
tabular.  We are storing rows & columns.  In particular, I found it confusing 
that KeyColumnArray was a 1D data structure while KeyRowArray was a 2D table 
structure.
     * Overall structure: I created a new folder `arrow/compute/row` and put 
all row-based utilities in here.  Most of the files are now marked as 
`_internal` and the content in these files is not used outside of 
`arrow/compute/row`.  The grouper had previously been alongside the kernel code 
and it didn't really belong there.  It also relies very heavily on the internal 
structure of the row encoding.
     * Row structure: I documented the file `arrow/compute/row/row_internal.h` 
and would appreciate review of the content here.
     * Semi-external API: The "external" API now consists of 
`arrow/compute/row/grouper.h`, which is more or less unchanged, and 
`arrow/compute/encode.h` which is mostly new code from 
7daafeb02906b59c153127463a0c6bed4c888026.  We have quite a few tests exercising 
`Grouper` and I was unable to easily extract them from the aggregate kernel 
tests so I left the tests alone.  I added tests to the new utilities in 
`encode.h`.  These utilities are not yet properly external because they depend 
on `RowTableImpl` and I would need to migrate them to the PIMPL pattern to 
remove this dependency.  Since that would be a functional change I think it 
would be best to wait.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to