Are there other projects / folks waiting on the 1.9.0 release? Given that it has been a while since our last release, it might make sense for us to first put that out (if I understand correctly, everything else has been resolved). Seems like there's a bit of discussion on Andrew's PR and we could follow up with another release once that is resolved. I'm worried about the release goal post constantly moving and us adding more and more to the 1.9.0 release.
On Sat, Sep 10, 2016 at 11:42 AM, Ryan Blue <[email protected]> wrote: > To keep the list up-to-date, there's more discussion on this topic on > Andrew's original pull request [1] about whether we should consider this a > bug. I don't think we should build a release candidate until we agree that > it's a bug or not, and if it is a bug add a work-around to stop returning > the bad min and max. If you're interested, please join the discussion on > the PR. > > Thanks! > > rb > > > [1]: https://github.com/apache/parquet-mr/pull/362 > > On Fri, Sep 9, 2016 at 2:50 PM, Ryan Blue <[email protected]> wrote: > > > I think what Spark needs to do depends on how we decide to fix this. I > > think the right thing to do for now is to treat these stats as invalid > and > > not return them, like we did for binary in PARQUET-251. That's an easy > fix > > because we've done it before and would just need to add another check > what > > we did the last time. That way, Spark is automatically fixed when it > > updates to 1.9.0. We can work in 1.9.1 and later to fix how stats are > > written and agree on an update to the format. > > > > I just posted a pull request that implements this. It doesn't require a > > parquet-format release and will fix the problem for downstream readers by > > not returning incorrect stats, based on the column type. In the patch, > > stats are suppressed by default for unsigned integers, strings, enums, > and > > decimals. Readers can override this for string types by setting > > parquet.strings.use-signed-order. > > > > Does that seem like a reasonable fix to get this release out? > > > > rb > > > > On Fri, Sep 9, 2016 at 6:13 AM, Andrew Duffy <[email protected]> > wrote: > > > >> So I’m cool with making necessary changes to get this in sooner rather > >> than later, I’ve mostly been blocking on code reviews. If there’s a > >> commitment made to releasing 1.9.1 as soon as the binary sort order > change > >> goes in then I’m fine with not blocking on this for 1.9, but if not and > we > >> can expect the current velocity of releases then I’d rather see if we > can > >> get it in now. Especially because Spark and others will need to wait > until > >> next release (>= 1 year from now?) to fix use of statistics for binary > >> data. As Rob said, this is broken in master of Spark so without a > >> fix+release of Parquet it’ll likely require disabling statistics > pushdown > >> for binary columns on their end. > >> > >> -Andrew > >> > >> On 9/8/16, 11:37 PM, "Jacques Nadeau" <[email protected]> wrote: > >> > >> >Absolutely agree the fix would be good to get in. > >> > > >> >My comments come from the fact that we tried to start the 1.9 release > >> (late > >> >last year I think) after the direct memory feature got in and it never > >> >completed (as there is always a good reason to hold the release). My > main > >> >fear is that if we wait for something and it turns into another > >> substantial > >> >delay before a release. Ryan, what do you think about proposing a hard > >> >deadline? > >> > > >> >On Thu, Sep 8, 2016 at 7:57 PM, Robert Kruszewski < > [email protected]> > >> >wrote: > >> > > >> >> When would the fix be ready by to consider it for release in 1.9.0? > >> This > >> >> affects correctness on current master of spark and would be good to > >> get a > >> >> release with it fixed so that people aren’t using potentially bad > >> versions. > >> >> > >> >> - Robert > >> >> > >> >> On 9/8/16, 10:44 PM, "Jacques Nadeau" <[email protected]> wrote: > >> >> > >> >> A non-binding +1 from me on releasing sooner/more often. > >> >> > >> >> On Thu, Sep 8, 2016 at 5:44 PM, Ryan Blue > >> <[email protected]> > >> >> wrote: > >> >> > >> >> > Hey everyone, > >> >> > > >> >> > I'd like to put together a release candidate for 1.9.0. The > other > >> >> issues > >> >> > are done, but the sort order min/max issue, PARQUET-686 is > still > >> >> open. > >> >> > > >> >> > I'm okay releasing 1.9.0 without fixing that issue since it's > >> been > >> >> so long > >> >> > since our last release. It would also be nice to do releases as > >> >> necessary, > >> >> > so we can always do a release to fix PARQUET-686 when the patch > >> for > >> >> it is > >> >> > ready. Is there anyone that thinks we should definitely get > this > >> >> into the > >> >> > 1.9.0 release? > >> >> > > >> >> > Thanks, > >> >> > > >> >> > rb > >> >> > > >> >> > -- > >> >> > Ryan Blue > >> >> > Software Engineer > >> >> > Netflix > >> >> > > >> >> > >> >> > >> > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > > > > > -- > Ryan Blue > Software Engineer > Netflix > -- - Piyush
