JerAguilon commented on code in PR #41874:
URL: https://github.com/apache/arrow/pull/41874#discussion_r1619131719


##########
cpp/src/arrow/acero/unmaterialized_table.h:
##########
@@ -204,15 +231,79 @@ class UnmaterializedCompositeTable {
     return builder.Append(data + offset0, offset1 - offset0);
   }
 
+  arrow::Result<std::vector<CompositeEntry>> FlattenSlices(int table_index) {
+    std::vector<CompositeEntry> flattened_blocks;
+
+    arrow::RecordBatch* active_rb = NULL;
+    size_t start = -1;
+    size_t end = -1;
+
+    for (const auto& slice : slices) {

Review Comment:
   If it's not self-evident, the asof-join works by creating a `CompositeEntry` 
for each output row.
   
   Since these so-called "contiguous inputs" are `Slice`able, we squash these 
entries down as a preprocessing step. For example, suppose `slices` has a LHS 
table that looks like this:
   
   ```
   {rb_addr: 1234, start: 1, end: 2},
   {rb_addr: 1234, start: 2, end: 3},
   {rb_addr: 1234, start: 3, end: 4},
   ...
   {rb_addr: 1234, start: 3, end: 1001},
   {rb_addr: 4321, start: 100001, end: 100002},
   {rb_addr: 4321, start: 100002, end: 100003},
   ...
   {rb_addr: 4321, start: 100002, end: 123456},
   ```
   
   It's be silly to find derive slices in this potentially long vector for 
every column we mean to output. Thus, this function will squash this down to a 
very compact vector:
   
   ```
   {rb_addr: 1234, start: 1, end: 1001},
   {rb_addr: 4321, start: 100001, end: 123456},
   ```
   
   Which we can quickly use to slice the appropriate output column(s).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to