Not in V2, in V1 the whole page is encoded, but in V2 it is only values, if
I remember correctly. So we would have to extract repetition and definition
levels bytes and then decode values.

You can check out code in parquet rust module!

I am not sure about parquet-cpp, we can use that implementation as
reference there.


On Mon, 29 Apr 2019 at 5:39 PM, Curt Hagenlocher <[email protected]>
wrote:

> Would that be covered by PARQUET-458 (
> https://issues.apache.org/jira/browse/PARQUET-458)?
>
> On Mon, Apr 29, 2019 at 8:18 AM Wes McKinney <[email protected]> wrote:
>
> > Is there a JIRA issue about data page v2 issues in parquet-cpp?
> >
> > On Mon, Apr 29, 2019 at 9:57 AM Curt Hagenlocher <[email protected]>
> > wrote:
> > >
> > > But the data page is decoded only after it is decompressed, so I
> > wouldn’t expect an unsupported data page to cause a decompression
> failure.
> > >
> > > (I am playing with adding V2 support to Parquet.Net.)
> > >
> > > Sent from my iPhone
> > >
> > > > On Apr 29, 2019, at 7:30 AM, Ivan Sadikov <[email protected]>
> > wrote:
> > > >
> > > > If you are referring to the file in Apache/parquet-testing
> repository,
> > it
> > > > is a valid Parquet file with data encoded into data page v2.
> > > >
> > > > You can easily test it with “cargo install parquet” and “parquet-read
> > > > filepath”.
> > > >
> > > > I am not sure what kind of code you have written, but the error you
> > have
> > > > encountered could be related to the fact that parquet-cpp does not
> > support
> > > > decoding of data page v2.
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Ivan
> > > >
> > > > On Mon, 29 Apr 2019 at 3:36 PM, Curt Hagenlocher <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > >> To the best of my ability to tell, there is invalid Snappy data in
> > the file
> > > >> parquet-testing/data/datapage_v2.snappy.parquet. I can neither read
> > it with
> > > >> my own code nor with pyarrow 0.13.0. Is this expected to work?
> > > >>
> > > >> Thanks!
> > > >> -Curt
> > > >>
> >
>

Reply via email to