tustvold commented on code in PR #2407:
URL: https://github.com/apache/arrow-rs/pull/2407#discussion_r944233720


##########
parquet/src/util/test_common/page_util.rs:
##########
@@ -166,11 +167,28 @@ impl<P: Iterator<Item = Page> + Send> PageReader for 
InMemoryPageReader<P> {
     }
 
     fn peek_next_page(&mut self) -> Result<Option<PageMetadata>> {
-        unimplemented!()
+        if let Some(x) = self.page_iter.peek() {
+            match x {
+                Page::DataPage { .. } => {
+                    unreachable!()

Review Comment:
   Yeah, most applications have yet to update to parquet v2 despite it being 
released almost a decade ago now :laughing: 
   
   IOx isn't currently, but may in future. DeltaBitPack is very good for 
storing sorted timestamps as are common for our workload. Although their are 
some questionable decisions in the DeltaBitPack encoding scheme that make it 
more expensive than it really should be... Perhaps that is why most 
applications haven't bothered to upgrade :sweat_smile: 
   
   That was why I was surprised to see this isn't being exercised :sweat_smile: 
   
   I can't remember if column index support requires v2 :thinking: 
   
   > DataPagev1 not have the 'num_rows' info.
   
   Just use num_values (which also includes nulls) it is a best effort number 
that is only relied upon if there aren't any levels (in which case it is the 
row count)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to