Ok, think that sounds good to me. Shall take a look at your PR as well.

On Mon, Sep 12, 2016 at 10:49 AM, Ryan Blue <[email protected]>
wrote:

> Thanks, Piyush. I agree and would like to get the release out before we
> completely finish the changes for the sort order bug. But, I think it is
> important to do something to address the problem that stats pages for
> binary/UTF8 don't match Java's String sort order or UTF-8. That's why I
> posted a PR with minimal changes to disable returning stats using the
> signed order, unless a property is set to opt into using them.
>
> Alex is arguing that the current signed min/max shouldn't be considered a
> bug so we shouldn't suppress them in the 1.9.0 release. I think we should
> agree on whether this is a correctness issue and, if it is, fix it before
> the release. We can fix the sort order in 1.9.1.
>
> rb
>
> On Mon, Sep 12, 2016 at 10:43 AM, Piyush Narang
> <[email protected]
> > wrote:
>
> > Are there other projects / folks waiting on the 1.9.0 release? Given that
> > it has been a while since our last release, it might make sense for us to
> > first put that out (if I understand correctly, everything else has been
> > resolved). Seems like there's a bit of discussion on Andrew's PR and we
> > could follow up with another release once that is resolved. I'm worried
> > about the release goal post constantly moving and us adding more and more
> > to the 1.9.0 release.
> >
> > On Sat, Sep 10, 2016 at 11:42 AM, Ryan Blue <[email protected]>
> > wrote:
> >
> > > To keep the list up-to-date, there's more discussion on this topic on
> > > Andrew's original pull request [1] about whether we should consider
> this
> > a
> > > bug. I don't think we should build a release candidate until we agree
> > that
> > > it's a bug or not, and if it is a bug add a work-around to stop
> returning
> > > the bad min and max. If you're interested, please join the discussion
> on
> > > the PR.
> > >
> > > Thanks!
> > >
> > > rb
> > >
> > >
> > > [1]: https://github.com/apache/parquet-mr/pull/362
> > >
> > > On Fri, Sep 9, 2016 at 2:50 PM, Ryan Blue <[email protected]> wrote:
> > >
> > > > I think what Spark needs to do depends on how we decide to fix this.
> I
> > > > think the right thing to do for now is to treat these stats as
> invalid
> > > and
> > > > not return them, like we did for binary in PARQUET-251. That's an
> easy
> > > fix
> > > > because we've done it before and would just need to add another check
> > > what
> > > > we did the last time. That way, Spark is automatically fixed when it
> > > > updates to 1.9.0. We can work in 1.9.1 and later to fix how stats are
> > > > written and agree on an update to the format.
> > > >
> > > > I just posted a pull request that implements this. It doesn't
> require a
> > > > parquet-format release and will fix the problem for downstream
> readers
> > by
> > > > not returning incorrect stats, based on the column type. In the
> patch,
> > > > stats are suppressed by default for unsigned integers, strings,
> enums,
> > > and
> > > > decimals. Readers can override this for string types by setting
> > > > parquet.strings.use-signed-order.
> > > >
> > > > Does that seem like a reasonable fix to get this release out?
> > > >
> > > > rb
> > > >
> > > > On Fri, Sep 9, 2016 at 6:13 AM, Andrew Duffy <[email protected]>
> > > wrote:
> > > >
> > > >> So I’m cool with making necessary changes to get this in sooner
> rather
> > > >> than later, I’ve mostly been blocking on code reviews. If there’s a
> > > >> commitment made to releasing 1.9.1 as soon as the binary sort order
> > > change
> > > >> goes in then I’m fine with not blocking on this for 1.9, but if not
> > and
> > > we
> > > >> can expect the current velocity of releases then I’d rather see if
> we
> > > can
> > > >> get it in now. Especially because Spark and others will need to wait
> > > until
> > > >> next release (>= 1 year from now?) to fix use of statistics for
> binary
> > > >> data. As Rob said, this is broken in master of Spark so without a
> > > >> fix+release of Parquet it’ll likely require disabling statistics
> > > pushdown
> > > >> for binary columns on their end.
> > > >>
> > > >> -Andrew
> > > >>
> > > >> On 9/8/16, 11:37 PM, "Jacques Nadeau" <[email protected]> wrote:
> > > >>
> > > >> >Absolutely agree the fix would be good to get in.
> > > >> >
> > > >> >My comments come from the fact that we tried to start the 1.9
> release
> > > >> (late
> > > >> >last year I think) after the direct memory feature got in and it
> > never
> > > >> >completed (as there is always a good reason to hold the release).
> My
> > > main
> > > >> >fear is that if we wait for something and it turns into another
> > > >> substantial
> > > >> >delay before a release. Ryan, what do you think about proposing a
> > hard
> > > >> >deadline?
> > > >> >
> > > >> >On Thu, Sep 8, 2016 at 7:57 PM, Robert Kruszewski <
> > > [email protected]>
> > > >> >wrote:
> > > >> >
> > > >> >> When would the fix be ready by to consider it for release in
> 1.9.0?
> > > >> This
> > > >> >> affects correctness on current master of spark and would be good
> to
> > > >> get a
> > > >> >> release with it fixed so that people aren’t using potentially bad
> > > >> versions.
> > > >> >>
> > > >> >> -          Robert
> > > >> >>
> > > >> >> On 9/8/16, 10:44 PM, "Jacques Nadeau" <[email protected]>
> wrote:
> > > >> >>
> > > >> >>     A non-binding +1 from me on releasing sooner/more often.
> > > >> >>
> > > >> >>     On Thu, Sep 8, 2016 at 5:44 PM, Ryan Blue
> > > >> <[email protected]>
> > > >> >> wrote:
> > > >> >>
> > > >> >>     > Hey everyone,
> > > >> >>     >
> > > >> >>     > I'd like to put together a release candidate for 1.9.0. The
> > > other
> > > >> >> issues
> > > >> >>     > are done, but the sort order min/max issue, PARQUET-686 is
> > > still
> > > >> >> open.
> > > >> >>     >
> > > >> >>     > I'm okay releasing 1.9.0 without fixing that issue since
> it's
> > > >> been
> > > >> >> so long
> > > >> >>     > since our last release. It would also be nice to do
> releases
> > as
> > > >> >> necessary,
> > > >> >>     > so we can always do a release to fix PARQUET-686 when the
> > patch
> > > >> for
> > > >> >> it is
> > > >> >>     > ready. Is there anyone that thinks we should definitely get
> > > this
> > > >> >> into the
> > > >> >>     > 1.9.0 release?
> > > >> >>     >
> > > >> >>     > Thanks,
> > > >> >>     >
> > > >> >>     > rb
> > > >> >>     >
> > > >> >>     > --
> > > >> >>     > Ryan Blue
> > > >> >>     > Software Engineer
> > > >> >>     > Netflix
> > > >> >>     >
> > > >> >>
> > > >> >>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Netflix
> > > >
> > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> >
> >
> >
> > --
> > - Piyush
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>



-- 
- Piyush

Reply via email to