HippoBaro commented on PR #9653:
URL: https://github.com/apache/arrow-rs/pull/9653#issuecomment-4209773127
Thanks for the reviews! I've reworked the branch to address all feedback.
Sorry for the delay, it took me a while to experiment.
The main structural change is a `LevelData` enum refactor suggested by
@jhorstmann. Thank you for the excellent suggestion. As I am primarily
concerned with the performance of very sparse data, I hadn't considered the
possibility to also speed up the non-null-but-uniform code path.
The `Option<Vec<i16>> + uniform_levels: Option<(i16, i16, usize)>` tuple is
replaced by a single enum:
```rust
enum LevelData {
Absent,
Materialized(Vec<i16>),
Uniform { value: i16, count: usize },
}
```
`Absent` replaces the previous `None` case, `Uniform` replaces the ad-hoc
tuple for entirely-null columns, and `Materialized` is the normal vec path.
This unifies the three states into one type and makes transitions between them
easy to follow. This yields a nice performance improvement documented in
ab9a7bc3c5a48323438af62f8d70ffa6d26066b8.
The resulting refactor has a larger LoC footprint, but the API is arguably
much cleaner and robust.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]