hhhizzz commented on code in PR #8733:
URL: https://github.com/apache/arrow-rs/pull/8733#discussion_r2500284787


##########
parquet/src/column/reader.rs:
##########
@@ -214,6 +219,49 @@ where
             let remaining_records = max_records - total_records_read;
             let remaining_levels = self.num_buffered_values - 
self.num_decoded_values;
 
+            if self.synthetic_page {

Review Comment:
   @alamb Hi, I’ve finished most parts of the change.
   The main idea is to **record whether any page has been skipped** — if so, 
the code will fall back to using `RowSelector`.
   I’ve added the corresponding unit tests, and noticed that **the sync version 
currently doesn’t perform page skipping**. I still included unit tests for it 
in case this functionality is added later.
   Next, I’ll refine the comments and collect statistics on the average 
selector length across different scenarios. The logic for that is already in 
the code — I just need to write a small script to simplify the process and 
generate the charts.
   In the meantime, could you please review the current design and let me know 
if it looks acceptable or if there are any improvements I could make?😁
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to