Re: ProjectMergeRule

Jinfeng Ni Tue, 21 Jul 2015 15:47:46 -0700

Drill insert a top Project to ensure the final field names are what
specified in the query.  So, in general, field renaming in the intermediate
project should not impact query execution, except for * column in a
schema-less table. For * column, Drill has to keep the original form, since
the expansion to regular columns is delayed to execution time.  If query
planner rename * column to ColumnA, then semantically it would return
different result.  ( One commit in Drill's forked Calcite is specifically
to keep the * column name in subquery de-correlation logic, otherwise it
would break Drill's query execution).


For ProjectMergeRule, since we have the option to turn on/off force mode,
 probably it would not impact Drill for now.

For Drill 1.2.0 release, we are targeting to rebase our forked Calcite onto
Calcite 1.4.0-SNAPSHOT. I'm also thinking about how to refactor the logic
of * column handling in planner while doing the rebasing. But it would take
time.



On Tue, Jul 21, 2015 at 3:13 PM, Jacques Nadeau <[email protected]> wrote:

> I'd love the other Drill guys to chime here on their thoughts.  Your
> suggestion makes sense.
>
> I'm not sure that Drill will have a problem since we do a final tree
> rewrite to avoid the name inconsistency issue.
>
> On Tue, Jul 21, 2015 at 12:23 PM, Julian Hyde <[email protected]> wrote:
>
> > Jacques,
> >
> > I can make the default instance (ProjectMergeRule.INSTANCE) have
> > force=true (currently it has force=false) and only remove a renaming
> > project if force=true. Then most people will get the benefit, but if
> > there is a problem you can switch Drill to using a custom instance.
> >
> > Also, this would be good reason to test Drill against a 1.4-SNAPSHOT
> > when it is posted.
> >
> > Julian
> >
> >
> > On Tue, Jul 21, 2015 at 11:18 AM, Jacques Nadeau <[email protected]>
> > wrote:
> > > I'm a little nervous about that for Drill.  Despite the goal to do full
> > > testing to make sure we weren't accidentally using field names
> anywhere,
> > we
> > > haven't yet gotten very far.  We know we're not supposed to depend on
> > > anything but ordinal but as a name based system, it is likely something
> > > depends on something there.
> > >
> > > On Tue, Jul 21, 2015 at 11:04 AM, Julian Hyde <[email protected]>
> wrote:
> > >
> > >> ProjectMergeRule currently refuses to reduce identity projects if the
> > >> fields have different names.
> > >>
> > >> For instance suppose you have a table Dept (deptno, name) and the
> > algebra
> > >>
> > >>   2: Project($1 as X, $0 as Y)
> > >>     1: Project($1, $0)
> > >>       0: Scan(Dept)
> > >>
> > >> Observe that if you combine projects #1 and #2 you end up with
> > >>
> > >>   3: Project($0 as X, $1 as Y)
> > >>     0: Scan(Dept)
> > >>
> > >> Although the new project (#3) is an identity, it renames the fields.
> > >> ProjectMergeRule will return the new project (#3), but it could return
> > >> Scan(Dept) (#0).
> > >>
> > >> Does anyone think they will break if I make it return #0?
> > >>
> > >> Julian
> > >>
> >
>

Re: ProjectMergeRule

Reply via email to