zhuqi-lucas commented on code in PR #7401:
URL: https://github.com/apache/arrow-rs/pull/7401#discussion_r2041140455


##########
parquet/benches/arrow_reader_row_filter.rs:
##########
@@ -0,0 +1,606 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Benchmark for evaluating row filters and projections on a Parquet file.
+//!
+//! # Background:
+//!
+//! As described in [Efficient Filter Pushdown in Parquet], evaluating
+//! pushdown filters is a two-step process:
+//!
+//! 1. Build a filter mask by decoding and evaluating filter functions on
+//!    the filter column(s).
+//!
+//! 2. Decode the rows that match the filter mask from the projected columns.
+//!
+//! The performance depends on factors such as the number of rows selected,
+//! the clustering of results (which affects the efficiency of the filter 
mask),
+//! and whether the same column is used for both filtering and projection.
+//!
+//! This benchmark helps measure the performance of these operations.
+//!
+//! [Efficient Filter Pushdown in Parquet]: 
https://datafusion.apache.org/blog/2025/03/21/parquet-pushdown/
+//!
+//! The benchmark creates an in-memory Parquet file with 100K rows and ten 
columns.
+//! The first four columns are:
+//!   - int64: random integers (range: 0..100) generated with a fixed seed.
+//!   - float64: random floating-point values (range: 0.0..100.0) generated 
with a fixed seed.
+//!   - utf8View: random strings with some empty values and occasional 
constant "const" values.
+//!   - ts: sequential timestamps in milliseconds.

Review Comment:
   Good point @alamb , i will try to address soon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to