tustvold commented on code in PR #2407:
URL: https://github.com/apache/arrow-rs/pull/2407#discussion_r944233720
##########
parquet/src/util/test_common/page_util.rs:
##########
@@ -166,11 +167,28 @@ impl<P: Iterator<Item = Page> + Send> PageReader for
InMemoryPageReader<P> {
}
fn peek_next_page(&mut self) -> Result<Option<PageMetadata>> {
- unimplemented!()
+ if let Some(x) = self.page_iter.peek() {
+ match x {
+ Page::DataPage { .. } => {
+ unreachable!()
Review Comment:
Yeah, most applications have yet to update to parquet v2 despite it being
released almost a decade ago now :laughing:
IOx isn't currently, but may in future. DeltaBitPack is very good for
storing sorted timestamps as are common for our workload. Although their are
some questionable decisions in the DeltaBitPack encoding scheme that make it
more expensive than it really should be... Perhaps that is why most
applications haven't bothered to upgrade :sweat_smile:
That was why I was surprised to see this isn't being exercised :sweat_smile:
I can't remember if column index support requires v2 :thinking:
> DataPagev1 not have the 'num_rows' info.
Just use num_values (which also includes nulls) it is a best effort number
that is only relied upon if there aren't any levels (in which case it is the
row count)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]