HippoBaro commented on PR #9653:
URL: https://github.com/apache/arrow-rs/pull/9653#issuecomment-4209773127

   Thanks for the reviews! I've reworked the branch to address all feedback. 
Sorry for the delay, it took me a while to experiment. 
   
   The main structural change is a `LevelData` enum refactor suggested by  
@jhorstmann. Thank you for the excellent suggestion. As I am primarily 
concerned with the performance of very sparse data, I hadn't considered the 
possibility to also speed up the non-null-but-uniform code path. 
   
   The `Option<Vec<i16>> + uniform_levels: Option<(i16, i16, usize)>` tuple is 
replaced by a single enum:
   
   ```rust
     enum LevelData {
         Absent,
         Materialized(Vec<i16>),
         Uniform { value: i16, count: usize },
     }
   ```
   
   `Absent` replaces the previous `None` case, `Uniform` replaces the ad-hoc 
tuple for entirely-null columns, and `Materialized` is the normal vec path. 
This unifies the three states into one type and makes transitions between them 
easy to follow. This yields a nice performance improvement documented in 
ab9a7bc3c5a48323438af62f8d70ffa6d26066b8.
   
   The resulting refactor has a larger LoC footprint, but the API is arguably 
much cleaner and robust. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to