zhuqi-lucas commented on code in PR #7454: URL: https://github.com/apache/arrow-rs/pull/7454#discussion_r2084726492
########## parquet/src/arrow/arrow_reader/selection.rs: ########## @@ -90,18 +91,32 @@ impl RowSelector { /// assert_eq!(actual, expected); /// ``` /// -/// A [`RowSelection`] maintains the following invariants: -/// -/// * It contains no [`RowSelector`] of 0 rows -/// * Consecutive [`RowSelector`]s alternate skipping or selecting rows -/// -/// [`PageIndex`]: crate::file::page_index::index::PageIndex -#[derive(Debug, Clone, Default, Eq, PartialEq)] -pub struct RowSelection { - selectors: Vec<RowSelector>, +/// [`RowSelection`] is an enum that can be either a list of [`RowSelector`]s +/// or a [`BooleanArray`] bitmap +#[derive(Debug, Clone, PartialEq)] +pub enum RowSelection { Review Comment: > Given the two different representations have different uses > > 1. Skip contiguous ranges (basically skip entire data pages) > 2. filter out individual rows > > I think we may be able to actually keep these as two separate structs rather than combining them into a single struct. Good suggestions, i am also feeling it's strange here, we'd better to use two structs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org