alamb opened a new issue, #4466: URL: https://github.com/apache/arrow-rs/issues/4466
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I am implementing GroupByHash in DataFusion https://github.com/apache/arrow-datafusion/issues/4973 We use the `RowFormat` to store grouping keys which is awesome. The Grouping operation calculates the `Row` format for each input row, determines if they have been seen previously, and if not stores the newly seen `Row`. The only way I know of today is to copy each new row individually using [`owned()`](https://docs.rs/arrow-row/42.0.0/arrow_row/struct.Row.html#method.owned): ``` ┌──────────────────────────────────┐ │ ┌───────────────────────────────┐│ │ │ A ││ │ ├───────────────────────────────┤│ │ │ B │├────────────┐ │ ├───────────────────────────────┤│ │ │ │ A ││ │ │ ├───────────────────────────────┤│ │ │ │ A ││ │ ┌──────────────────────────────────┐ │ ├───────────────────────────────┤│ │ │ ┌───────────────────────────────┐│ │ │ C ││ │ │ │ A ││ │ ├───────────────────────────────┤│ │ │ └───────────────────────────────┘│ │ │ B ││ │ │ ┌───────────────────────────────┐│ │ ├───────────────────────────────┤│ └───────────┼▶│ B ││ │ │ A ││ │ └───────────────────────────────┘│ │ ├───────────────────────────────┤│ to add a new row, I │ │ │ │ A ││ currently do │ │ │ └───────────────────────────────┘│ `Row::owned()` to │ │ │ group keys for input batch │ get a copy │ distinct group keys seen in │ │ often many repeated values │ │ previous batches │ │ │ │ │ └──────────────────────────────────┘ └──────────────────────────────────┘ arrow_row::Rows Vec<arrow_row::OwnedRow> ``` **Describe the solution you'd like** I would like to be able to append a `Row` directly to a `Rows`: ``` ┌──────────────────────────────────┐ │ ┌───────────────────────────────┐│ │ │ A ││ │ ├───────────────────────────────┤│ │ │ B │├────────────┐ │ ├───────────────────────────────┤│ │ │ │ A ││ │ │ ├───────────────────────────────┤│ │ │ │ A ││ │ ┌──────────────────────────────────┐ │ ├───────────────────────────────┤│ │ │ ┌───────────────────────────────┐│ │ │ C ││ │ │ │ A ││ │ ├───────────────────────────────┤│ │ │ ├───────────────────────────────┤│ │ │ B ││ └───────────┼▶│ B ││ │ ├───────────────────────────────┤│ │ └───────────────────────────────┘│ │ │ A ││ │ │ │ ├───────────────────────────────┤│ Copying a new Row │ │ │ │ A ││ would just copy │ │ │ └───────────────────────────────┘│ some bytes to the │ │ │ group keys for input batch │ other Rows │ distinct group keys seen in │ │ often many repeated values │ │ previous batches │ │ │ │ │ └──────────────────────────────────┘ └──────────────────────────────────┘ arrow_row::Rows arrow_row::Rows ``` **Describe alternatives you've considered** Currently my POC code uses `Vec<OwnedRow>` which adds an extra allocation for each row 😢 **Additional context** https://github.com/apache/arrow-datafusion/issues/4973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
