Hey Guys,

It sounds like the Parquet upgrade in 1.3 have fixed an incorrect result
problem with externally generated files. This has unfortunately resulted in
a performance regression in the context of partition pruning. I'm neutral
on whether this is a release stopper but it sounds like we have some strong
opinions from Aman, Jinfeng and Rahul. As such, I think this kills the
release.

It seems like there are at least two options for resolution:

- give people a migration tool for their previous Drill-created Parquet
files
- provide people a switch to enable the old behavior. (This will possibly
give users incorrect results if they use this in the wrong context--ick...)

Let's move the discussion of the potential fix approaches to the DRILL-4070
that Rahul filed.

Two other questions that we should probably figure out answers to:
- How can we make sure this gets caught by testing in the future?
- Who wants to work on the fix?

How does that sound?

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Nov 12, 2015 at 10:48 AM, rahul challapalli <
[email protected]> wrote:

> While breaking backward compatibility could be justified in cases like
> this, doing this without providing a tested upgrade process is
> unacceptable.
>
> - Rahul
>
> On Thu, Nov 12, 2015 at 10:43 AM, Steven Phillips <[email protected]>
> wrote:
>
> > Does DRILL-4070 cause incorrect results? Or just prevent partition
> pruning?
> >
> > On Thu, Nov 12, 2015 at 10:32 AM, Jason Altekruse <
> > [email protected]>
> > wrote:
> >
> > > I just commented on the JIRA, we are behaving correctly for newly
> created
> > > parquet files. I did confirm the failure to prune on auto-partitioned
> > files
> > > created by 1.2. I do not think this is a release blocker, because I do
> > not
> > > think we can solve this in Drill code without risking wrong results
> over
> > > parquet files written by other tools. I do support the creation of a
> > > migration utility for existing files written by Drill 1.2, but this can
> > be
> > > released independent of 1.3.
> > >
> > >
> > > On Thu, Nov 12, 2015 at 10:26 AM, Jinfeng Ni <[email protected]>
> > > wrote:
> > >
> > > > Agree with Aman that DRILL-4070 is a show stopper. Parquet is the
> > > > major data source Drill uses. If this release candidate breaks the
> > > > backward compatibility of partitioning pruning for the parquet files
> > > > created with prior release of Drill, it could cause serious problem
> > > > for the current Drill user.
> > > >
> > > > -1
> > > >
> > > >
> > > >
> > > > On Thu, Nov 12, 2015 at 10:10 AM, rahul challapalli
> > > > <[email protected]> wrote:
> > > > > -1 (non-binding)
> > > > > The nature of the issue (DRILL-4070) demands adequate testing even
> > > with a
> > > > > workaround in place.
> > > > >
> > > > > On Thu, Nov 12, 2015 at 9:32 AM, Aman Sinha <[email protected]>
> > > > wrote:
> > > > >
> > > > >> Given this issue, I would be a -1  unfortunately.
> > > > >>
> > > > >> On Thu, Nov 12, 2015 at 8:42 AM, Aman Sinha <[email protected]
> >
> > > > wrote:
> > > > >>
> > > > >> > Can someone familiar with the parquet changes take a look at
> > > > DRILL-4070 ?
> > > > >> > It seems to break backward compatibility.
> > > > >> >
> > > > >> > On Tue, Nov 10, 2015 at 9:51 PM, Jacques Nadeau <
> > [email protected]
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Hey Everybody,
> > > > >> >>
> > > > >> >> I'd like to propose a new release candidate of Apache Drill,
> > > version
> > > > >> >> 1.3.0.  This is the third release candidate (rc2).  This
> > addresses
> > > > some
> > > > >> >> issues identified in the the second release candidate including
> > > some
> > > > >> test
> > > > >> >> issues & rpc concurrency issues.
> > > > >> >>
> > > > >> >> The tarball artifacts are hosted at [2] and the maven artifacts
> > are
> > > > >> hosted
> > > > >> >> at [3]. This release candidate is based on commit
> > > > >> >> 13ab6b1f9897ebcf9179407ffaf84b79b0ee95a1 located at [4].
> > > > >> >> The vote will be open for 72 hours ending at 10PM Pacific,
> > November
> > > > 13,
> > > > >> >> 2015.
> > > > >> >>
> > > > >> >> [ ] +1
> > > > >> >> [ ] +0
> > > > >> >> [ ] -1
> > > > >> >>
> > > > >> >> thanks,
> > > > >> >> Jacques
> > > > >> >>
> > > > >> >> [1]
> > > > >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946
> > > > >> >> [2]http://people.apache.org/~jacques/apache-drill-1.3.0.rc2/
> > > > >> >> [3]
> > > > >> >>
> > > >
> > https://repository.apache.org/content/repositories/orgapachedrill-1013/
> > > > >> >> [4] https://github.com/jacques-n/drill/tree/drill-1.3.0
> > > > >> >>
> > > > >> >>
> > > > >> >> --
> > > > >> >> Jacques Nadeau
> > > > >> >> CTO and Co-Founder, Dremio
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
>

Reply via email to