alamb commented on PR #4524:
URL: https://github.com/apache/arrow-rs/pull/4524#issuecomment-1638945907
> For single column case, to embed a variable length column value, like
String, to the RawTable may not be good.
Just to be clear, what I was imagining for the group storage is not to
change the contents of the `RawTable` (it will continue to contain
group_indexes).
But instead of storing group_values using the arrow `Row` format
```
stores "group stores group values,
indexes" in arrow_row format
┌─────────────┐ ┌────────────┐
│ ┌─────┐ │ │ ┌────────┐ │
│ │ 5 │ │ ┌────┼▶│ "A" │ │
│ ├─────┤ │ │ │ ├────────┤ │
│ │ 9 │ │ │ │ │ "Z" │ │
│ └─────┘ │ │ │ └────────┘ │
│ ... │ │ │ │
│ ┌─────┐ │ │ │ ... │
│ │ 1 │───┼─┘ │ │
│ ├─────┤ │ │ │
│ │ 13 │───┼─┐ │ ┌────────┐ │
│ └─────┘ │ └────┼▶│ "Q" │ │
└─────────────┘ │ └────────┘ │
│ │
└────────────┘
map group_values
(Hash Table)
```
We would instead store the group values using a native type like `Vec<T>`
like this
```
stores "group stored in a
indexes" native Vec<T>
┌─────────────┐ ┌──────────┐
│ ┌─────┐ │ │ ┌──────┐ │
│ │ 5 │ │ ┌───────┼▶│ 1 │ │
│ ├─────┤ │ │ │ ├──────┤ │
│ │ 9 │ │ │ │ │ 3 │ │
│ └─────┘ │ │ │ └──────┘ │
│ ... │ │ │ │
│ ┌─────┐ │ │ │ ... │
│ │ 1 │───┼────┘ │ │
│ ├─────┤ │ │ │
│ │ 13 │───┼────┐ │ ┌──────┐ │
│ └─────┘ │ └───────┼▶│ 5 │ │
└─────────────┘ │ └──────┘ │
│ │
└──────────┘
group_values
map
(Hash Table)
```
I agree the null value would need some special handling, but since this
would only be for single columns (where there can be at most one null value) I
think we could figure out some way to handle it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]