Sorry I didn't get to this, will try again tomorrow. On Thu, Apr 30, 2020 at 11:09 AM Wes McKinney <[email protected]> wrote:
> I'd be fine with a patch release addressing this so long as it's > binary-only (to save us all time). > > On Thu, Apr 30, 2020, 12:30 PM Micah Kornfield <[email protected]> > wrote: > >> This sounds like something we might want to do and issue a patch release. >> It seems bad to default to a non-production version? >> >> I can try to take a look tonight at a patch of no gets to it before. >> >> Thanks, >> Micah >> >> On Wednesday, April 29, 2020, Wes McKinney <[email protected]> wrote: >> >> > On Wed, Apr 29, 2020 at 6:15 PM Pierre Belzile < >> [email protected]> >> > wrote: >> > > >> > > Wes, >> > > >> > > You used the words "forward compatible". Does this mean that 0.17 is >> able >> > > to decode 0.16 datapagev2? >> > >> > 0.16 doesn't write DataPageV2 at all, the version flag only determines >> > the type casting and metadata behavior I indicated in my email. The >> > changes in >> > >> > https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162 >> > a9da588516 >> > >> > enabled the use of DataPageV2 and I/we didn't think about the forward >> > compatibility issue (version=2.0 files written in 0.17.0 being >> > unreadable in 0.16.0). We might actually want to revert this (just the >> > toggle between DataPageV1/V2, not the whole patch). >> > >> > >> > >> > > Crossing my fingers... >> > > >> > > Pierre >> > > >> > > Le mer. 29 avr. 2020 à 19:05, Wes McKinney <[email protected]> a >> > écrit : >> > > >> > > > Ah, so we have a slight mess on our hands because the patch for >> > > > PARQUET-458 enabled the use of DataPageV2, which is not forward >> > > > compatible with older version because the implementation was fixed >> > > > (see the JIRA for more details) >> > > > >> > > > >> > > > >> https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162 >> > a9da588516 >> > > > >> > > > Unfortunately, in Python the version='1.0' / version='2.0' flag is >> > > > being used for two different purposes: >> > > > >> > > > * Expanded ConvertedType / LogicalType metadata, like unsigned types >> > > > and nanosecond timestamps >> > > > * DataPageV1 vs. DataPageV2 data pages >> > > > >> > > > I think we should separate these concepts and instead have a >> > > > "compatibility mode" option regarding the ConvertedType/LogicalType >> > > > annotations and the behavior around conversions when writing >> unsigned >> > > > integers, nanosecond timestamps, and other types to Parquet V1 >> (which >> > > > is the only "production" Parquet format). >> > > > >> > > > On Wed, Apr 29, 2020 at 5:56 PM Pierre Belzile < >> > [email protected]> >> > > > wrote: >> > > > > >> > > > > Hi, >> > > > > >> > > > > We've been using the parquet 2 format (mostly because of >> nanosecond >> > > > > resolution). I'm getting crashes in the C++ parquet decoder, arrow >> > 0.16, >> > > > > when decoding a parquet 2 file created with pyarrow 0.17.0. Is >> this >> > > > > expected? Would a 0.17 decode a 0.16? >> > > > > >> > > > > If that's not expected, I can put the debugger on it and see what >> is >> > > > > happening. I suspect it's with string fields (regular, not large >> > string). >> > > > > >> > > > > Cheers, Pierre >> > > > >> > >> >
