hhhizzz commented on code in PR #8733:
URL: https://github.com/apache/arrow-rs/pull/8733#discussion_r2485931247


##########
parquet/src/column/reader.rs:
##########
@@ -214,6 +219,49 @@ where
             let remaining_records = max_records - total_records_read;
             let remaining_levels = self.num_buffered_values - 
self.num_decoded_values;
 
+            if self.synthetic_page {

Review Comment:
   Thank you alamb, it looks like there's still something unresolved for the 
PR. I'm going to resolve it in the next few days. At mean time I may update or 
rebase the branch multiple times. So I converted the PR into draft.
   The things left are:
   1. Add benchmark for the different types of value to determine the final 
length to do the `selection`/`bitmask` converting
   2. Add some guidance or tool to draw the charts, then we can collect more 
statistics data from different platform.
   3. For the design of synthetic page, We all agree it's not a good idea, I 
need to find another method to handle the sparse page.
   4. Add new tests to test if the `bitmask` method can handle all kinds of 
skipped page in sparse column chunk.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to