Yes -- what I mean is that I want to put the DataPageV2 read path through the same unit testing rigor as the DataPageV1 path. I will take care of it now that I understand what's wrong; I commented on PARQUET-458
On Tue, Jun 4, 2019 at 12:38 AM Ivan Sadikov <[email protected]> wrote: > > Hi Wes, > > I think it’s the file that I added for parquet-rs to test data page v2 back > then - it is not used anywhere else. > > > Cheers, > > Ivan > > On Mon, 3 Jun 2019 at 10:15 PM, Wes McKinney <[email protected]> wrote: > > > I took a quick look at this -- DataPageV2 has a slightly different > > structure from DataPageV1, as indicated here > > > > > > https://github.com/apache/arrow/blob/master/cpp/src/parquet/parquet.thrift#L555 > > > > In DataPageV1, the encoded repetition/definition levels are compressed > > together with the values in the data page. In DataPageV2, only the > > values are compressed. I'll see if I can fashion a fix sufficient to > > read the test data file, but more extensive testing will be required > > to extend the other unit tests to test both reading and writing both > > types of data pages. > > > > On Tue, Apr 30, 2019 at 8:56 AM Curt Hagenlocher <[email protected]> > > wrote: > > > > > > Thanks! Either the documentation is a bit sparse for that level of > > detail, > > > or I haven't been looking in the right place. The factoring of the Java > > > implementation makes it hard for me to see what's going on there, but the > > > Rust implementation is straightforward enough despite my utter lack of > > > familiarity with the language. > > > > > > On Mon, Apr 29, 2019 at 10:41 AM Ivan Sadikov <[email protected]> > > > wrote: > > > > > > > Not in V2, in V1 the whole page is encoded, but in V2 it is only > > values, if > > > > I remember correctly. So we would have to extract repetition and > > definition > > > > levels bytes and then decode values. > > > > > > > > You can check out code in parquet rust module! > > > > > > > > I am not sure about parquet-cpp, we can use that implementation as > > > > reference there. > > > > > > > > > > > > On Mon, 29 Apr 2019 at 5:39 PM, Curt Hagenlocher <[email protected] > > > > > > > wrote: > > > > > > > > > Would that be covered by PARQUET-458 ( > > > > > https://issues.apache.org/jira/browse/PARQUET-458)? > > > > > > > > > > On Mon, Apr 29, 2019 at 8:18 AM Wes McKinney <[email protected]> > > > > wrote: > > > > > > > > > > > Is there a JIRA issue about data page v2 issues in parquet-cpp? > > > > > > > > > > > > On Mon, Apr 29, 2019 at 9:57 AM Curt Hagenlocher < > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > > But the data page is decoded only after it is decompressed, so I > > > > > > wouldn’t expect an unsupported data page to cause a decompression > > > > > failure. > > > > > > > > > > > > > > (I am playing with adding V2 support to Parquet.Net.) > > > > > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > > > > On Apr 29, 2019, at 7:30 AM, Ivan Sadikov < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > If you are referring to the file in Apache/parquet-testing > > > > > repository, > > > > > > it > > > > > > > > is a valid Parquet file with data encoded into data page v2. > > > > > > > > > > > > > > > > You can easily test it with “cargo install parquet” and > > > > “parquet-read > > > > > > > > filepath”. > > > > > > > > > > > > > > > > I am not sure what kind of code you have written, but the > > error you > > > > > > have > > > > > > > > encountered could be related to the fact that parquet-cpp does > > not > > > > > > support > > > > > > > > decoding of data page v2. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > Ivan > > > > > > > > > > > > > > > > On Mon, 29 Apr 2019 at 3:36 PM, Curt Hagenlocher < > > > > > [email protected] > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > >> To the best of my ability to tell, there is invalid Snappy > > data in > > > > > > the file > > > > > > > >> parquet-testing/data/datapage_v2.snappy.parquet. I can neither > > > > read > > > > > > it with > > > > > > > >> my own code nor with pyarrow 0.13.0. Is this expected to work? > > > > > > > >> > > > > > > > >> Thanks! > > > > > > > >> -Curt > > > > > > > >> > > > > > > > > > > > > > > > > >
