caiwanli opened a new issue, #39071:
URL: https://github.com/apache/arrow/issues/39071
### Describe the usage question you have. Please include as many useful
details as possible.
In my join operator, all data will be cached. The cache is defined as
follows:
`std::vector<std::shared_ptr<arrow::RecordBatch>> buffer_chunk;`
After caching all the data, I intend to perform Radix partitioning on the
buffer_chunk. The function interface is designed as follows:
`void radix_partition(vector<shared_ptr<arrow::RecordBatch>> &input,
std::shared_ptr<arrow::RecordBatch> &output, int *histogram);`
After the function execution, the output will be the reordered data, and the
"histogram" represents the histogram (used to record the starting position of
each partition).
For example, for the data in table 1, dividing the table into 4 partitions
results in the data becoming the data in table 2, and simultaneously, the
histogram is {0, 3, 7, 9}.

For the implementation of the radix_partition function, there are several
issues that need to be addressed:
1. How to efficiently traverse the "input" data?
2. How to efficiently insert data into the "output" while traversing?
Partition calculation method:
`part = tar & (1 << 2);`
For example,
```
part(57):57 & ((1 << 2) - 1 ) = 1,
part(92):92 & ((1 << 2) - 1 ) = 0
```
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]