jorisvandenbossche commented on PR #36955: URL: https://github.com/apache/arrow/pull/36955#issuecomment-1699616830
Yes, now I am confused ;) > > what happens if the user selects v1 data pages with Parquet version >= 2.0? Do they get RLE-encoded boolean data pages? > > Yes. The current implementation is supposed to do this. Note that this is the default situation for pyarrow users (without the user selecting anything in specific): you get version "2.+" features (eg unsigned integers, nanoseconds) but with data_page v1. But looking at the code, I assume that indeed the above statement is indeed correct: it just looks at the Parquet version (and enabled it for >2), not the DataPage version. I think it's then mostly the mention of "DataPage v2" in the issue and the code that makes it confusing, as the current PR is not tied to the DataPage version at all? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
