hhhizzz commented on code in PR #8733:
URL: https://github.com/apache/arrow-rs/pull/8733#discussion_r2507014855


##########
parquet/src/arrow/arrow_reader/read_plan.rs:
##########
@@ -20,19 +20,70 @@
 
 use crate::arrow::array_reader::ArrayReader;
 use crate::arrow::arrow_reader::{
-    ArrowPredicate, ParquetRecordBatchReader, RowSelection, RowSelector,
+    ArrowPredicate, ParquetRecordBatchReader, RowSelection, RowSelectionCursor,
+    RowSelectionStrategy, RowSelector,
 };
 use crate::errors::{ParquetError, Result};
 use arrow_array::Array;
 use arrow_select::filter::prep_null_mask_filter;
-use std::collections::VecDeque;
+use std::sync::atomic::{AtomicUsize, Ordering};
+
+// The average selector length threshold for choosing between
+// `RowSelectionStrategy::Mask` and `RowSelectionStrategy::Selectors`.
+// If the average selector length is less than this value,
+// `RowSelectionStrategy::Mask` is preferred.
+const AVG_SELECTOR_LEN_MASK_THRESHOLD: usize = 32;
+
+// The logic in `preferred_selection_strategy` depends on the constant
+// `AVG_SELECTOR_LEN_MASK_THRESHOLD`. To allow unit testing of this logic,
+// we use a mutable global variable that can be temporarily changed during 
tests.
+//
+// An `AtomicUsize` is used because the Rust test runner (`cargo test`) runs 
tests
+// in parallel by default. The atomic operations prevent data races between
+// different test threads that might try to modify this value simultaneously.
+//
+// For the production code path, `load(Ordering::Relaxed)` is used. This is the
+// weakest memory ordering and for a simple load on most modern architectures,
+// it compiles down to a regular memory read with negligible performance 
overhead.
+// The more expensive atomic operations with stronger ordering are only used 
in the
+// test-only functions below.
+static AVG_SELECTOR_LEN_MASK_THRESHOLD_OVERRIDE: AtomicUsize =

Review Comment:
   To use this type of method, I have to put the strategy into `ReaderBuilder` 
API. This change is in the last commit. Let me know if this new parameter is 
allowed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to