This technically is a regression over 1.8.0 since before statistics were 
treated as corrupted and not used. I think it would make sense to get 
workaround Ryan proposed in for release and merge the fix later.

-          Robert

On 10/10/16, 1:15 AM, "Jacques Nadeau" <[email protected]> wrote:

    Just following up here. It seems like we stalled out due to this issue. Per
    my previous comments, if we don't have a resolution for this, and it isn't
    regression, I propose moving forward without a change.
    
    On Mon, Sep 12, 2016 at 12:57 PM, Piyush Narang <[email protected]
    > wrote:
    
    > Ok, think that sounds good to me. Shall take a look at your PR as well.
    >
    > On Mon, Sep 12, 2016 at 10:49 AM, Ryan Blue <[email protected]>
    > wrote:
    >
    > > Thanks, Piyush. I agree and would like to get the release out before we
    > > completely finish the changes for the sort order bug. But, I think it is
    > > important to do something to address the problem that stats pages for
    > > binary/UTF8 don't match Java's String sort order or UTF-8. That's why I
    > > posted a PR with minimal changes to disable returning stats using the
    > > signed order, unless a property is set to opt into using them.
    > >
    > > Alex is arguing that the current signed min/max shouldn't be considered 
a
    > > bug so we shouldn't suppress them in the 1.9.0 release. I think we 
should
    > > agree on whether this is a correctness issue and, if it is, fix it 
before
    > > the release. We can fix the sort order in 1.9.1.
    > >
    > > rb
    > >
    > > On Mon, Sep 12, 2016 at 10:43 AM, Piyush Narang
    > > <[email protected]
    > > > wrote:
    > >
    > > > Are there other projects / folks waiting on the 1.9.0 release? Given
    > that
    > > > it has been a while since our last release, it might make sense for us
    > to
    > > > first put that out (if I understand correctly, everything else has 
been
    > > > resolved). Seems like there's a bit of discussion on Andrew's PR and 
we
    > > > could follow up with another release once that is resolved. I'm 
worried
    > > > about the release goal post constantly moving and us adding more and
    > more
    > > > to the 1.9.0 release.
    > > >
    > > > On Sat, Sep 10, 2016 at 11:42 AM, Ryan Blue <[email protected]
    > >
    > > > wrote:
    > > >
    > > > > To keep the list up-to-date, there's more discussion on this topic 
on
    > > > > Andrew's original pull request [1] about whether we should consider
    > > this
    > > > a
    > > > > bug. I don't think we should build a release candidate until we 
agree
    > > > that
    > > > > it's a bug or not, and if it is a bug add a work-around to stop
    > > returning
    > > > > the bad min and max. If you're interested, please join the 
discussion
    > > on
    > > > > the PR.
    > > > >
    > > > > Thanks!
    > > > >
    > > > > rb
    > > > >
    > > > >
    > > > > [1]: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_parquet-2Dmr_pull_362&d=DQIFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=Gukiqwaa9M7VDsJzd0J3W7mh_DfC1XlLxRRhg4t2Xyc&m=vztq-HOWpA5TCgl-PyMVWznvZmicIuenLMWs8_tR5U0&s=6E18BGXKrtD3v7AEpfvzIYS2VZRYEjn08fYEKthHlTE&e=
 
    > > > >
    > > > > On Fri, Sep 9, 2016 at 2:50 PM, Ryan Blue <[email protected]> wrote:
    > > > >
    > > > > > I think what Spark needs to do depends on how we decide to fix
    > this.
    > > I
    > > > > > think the right thing to do for now is to treat these stats as
    > > invalid
    > > > > and
    > > > > > not return them, like we did for binary in PARQUET-251. That's an
    > > easy
    > > > > fix
    > > > > > because we've done it before and would just need to add another
    > check
    > > > > what
    > > > > > we did the last time. That way, Spark is automatically fixed when
    > it
    > > > > > updates to 1.9.0. We can work in 1.9.1 and later to fix how stats
    > are
    > > > > > written and agree on an update to the format.
    > > > > >
    > > > > > I just posted a pull request that implements this. It doesn't
    > > require a
    > > > > > parquet-format release and will fix the problem for downstream
    > > readers
    > > > by
    > > > > > not returning incorrect stats, based on the column type. In the
    > > patch,
    > > > > > stats are suppressed by default for unsigned integers, strings,
    > > enums,
    > > > > and
    > > > > > decimals. Readers can override this for string types by setting
    > > > > > parquet.strings.use-signed-order.
    > > > > >
    > > > > > Does that seem like a reasonable fix to get this release out?
    > > > > >
    > > > > > rb
    > > > > >
    > > > > > On Fri, Sep 9, 2016 at 6:13 AM, Andrew Duffy <[email protected]>
    > > > > wrote:
    > > > > >
    > > > > >> So I’m cool with making necessary changes to get this in sooner
    > > rather
    > > > > >> than later, I’ve mostly been blocking on code reviews. If there’s
    > a
    > > > > >> commitment made to releasing 1.9.1 as soon as the binary sort
    > order
    > > > > change
    > > > > >> goes in then I’m fine with not blocking on this for 1.9, but if
    > not
    > > > and
    > > > > we
    > > > > >> can expect the current velocity of releases then I’d rather see 
if
    > > we
    > > > > can
    > > > > >> get it in now. Especially because Spark and others will need to
    > wait
    > > > > until
    > > > > >> next release (>= 1 year from now?) to fix use of statistics for
    > > binary
    > > > > >> data. As Rob said, this is broken in master of Spark so without a
    > > > > >> fix+release of Parquet it’ll likely require disabling statistics
    > > > > pushdown
    > > > > >> for binary columns on their end.
    > > > > >>
    > > > > >> -Andrew
    > > > > >>
    > > > > >> On 9/8/16, 11:37 PM, "Jacques Nadeau" <[email protected]> wrote:
    > > > > >>
    > > > > >> >Absolutely agree the fix would be good to get in.
    > > > > >> >
    > > > > >> >My comments come from the fact that we tried to start the 1.9
    > > release
    > > > > >> (late
    > > > > >> >last year I think) after the direct memory feature got in and it
    > > > never
    > > > > >> >completed (as there is always a good reason to hold the 
release).
    > > My
    > > > > main
    > > > > >> >fear is that if we wait for something and it turns into another
    > > > > >> substantial
    > > > > >> >delay before a release. Ryan, what do you think about proposing 
a
    > > > hard
    > > > > >> >deadline?
    > > > > >> >
    > > > > >> >On Thu, Sep 8, 2016 at 7:57 PM, Robert Kruszewski <
    > > > > [email protected]>
    > > > > >> >wrote:
    > > > > >> >
    > > > > >> >> When would the fix be ready by to consider it for release in
    > > 1.9.0?
    > > > > >> This
    > > > > >> >> affects correctness on current master of spark and would be
    > good
    > > to
    > > > > >> get a
    > > > > >> >> release with it fixed so that people aren’t using potentially
    > bad
    > > > > >> versions.
    > > > > >> >>
    > > > > >> >> -          Robert
    > > > > >> >>
    > > > > >> >> On 9/8/16, 10:44 PM, "Jacques Nadeau" <[email protected]>
    > > wrote:
    > > > > >> >>
    > > > > >> >>     A non-binding +1 from me on releasing sooner/more often.
    > > > > >> >>
    > > > > >> >>     On Thu, Sep 8, 2016 at 5:44 PM, Ryan Blue
    > > > > >> <[email protected]>
    > > > > >> >> wrote:
    > > > > >> >>
    > > > > >> >>     > Hey everyone,
    > > > > >> >>     >
    > > > > >> >>     > I'd like to put together a release candidate for 1.9.0.
    > The
    > > > > other
    > > > > >> >> issues
    > > > > >> >>     > are done, but the sort order min/max issue, PARQUET-686
    > is
    > > > > still
    > > > > >> >> open.
    > > > > >> >>     >
    > > > > >> >>     > I'm okay releasing 1.9.0 without fixing that issue since
    > > it's
    > > > > >> been
    > > > > >> >> so long
    > > > > >> >>     > since our last release. It would also be nice to do
    > > releases
    > > > as
    > > > > >> >> necessary,
    > > > > >> >>     > so we can always do a release to fix PARQUET-686 when 
the
    > > > patch
    > > > > >> for
    > > > > >> >> it is
    > > > > >> >>     > ready. Is there anyone that thinks we should definitely
    > get
    > > > > this
    > > > > >> >> into the
    > > > > >> >>     > 1.9.0 release?
    > > > > >> >>     >
    > > > > >> >>     > Thanks,
    > > > > >> >>     >
    > > > > >> >>     > rb
    > > > > >> >>     >
    > > > > >> >>     > --
    > > > > >> >>     > Ryan Blue
    > > > > >> >>     > Software Engineer
    > > > > >> >>     > Netflix
    > > > > >> >>     >
    > > > > >> >>
    > > > > >> >>
    > > > > >>
    > > > > >
    > > > > >
    > > > > >
    > > > > > --
    > > > > > Ryan Blue
    > > > > > Software Engineer
    > > > > > Netflix
    > > > > >
    > > > >
    > > > >
    > > > >
    > > > > --
    > > > > Ryan Blue
    > > > > Software Engineer
    > > > > Netflix
    > > > >
    > > >
    > > >
    > > >
    > > > --
    > > > - Piyush
    > > >
    > >
    > >
    > >
    > > --
    > > Ryan Blue
    > > Software Engineer
    > > Netflix
    > >
    >
    >
    >
    > --
    > - Piyush
    >
    

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to