Drill insert a top Project to ensure the final field names are what specified in the query. So, in general, field renaming in the intermediate project should not impact query execution, except for * column in a schema-less table. For * column, Drill has to keep the original form, since the expansion to regular columns is delayed to execution time. If query planner rename * column to ColumnA, then semantically it would return different result. ( One commit in Drill's forked Calcite is specifically to keep the * column name in subquery de-correlation logic, otherwise it would break Drill's query execution).
For ProjectMergeRule, since we have the option to turn on/off force mode, probably it would not impact Drill for now. For Drill 1.2.0 release, we are targeting to rebase our forked Calcite onto Calcite 1.4.0-SNAPSHOT. I'm also thinking about how to refactor the logic of * column handling in planner while doing the rebasing. But it would take time. On Tue, Jul 21, 2015 at 3:13 PM, Jacques Nadeau <[email protected]> wrote: > I'd love the other Drill guys to chime here on their thoughts. Your > suggestion makes sense. > > I'm not sure that Drill will have a problem since we do a final tree > rewrite to avoid the name inconsistency issue. > > On Tue, Jul 21, 2015 at 12:23 PM, Julian Hyde <[email protected]> wrote: > > > Jacques, > > > > I can make the default instance (ProjectMergeRule.INSTANCE) have > > force=true (currently it has force=false) and only remove a renaming > > project if force=true. Then most people will get the benefit, but if > > there is a problem you can switch Drill to using a custom instance. > > > > Also, this would be good reason to test Drill against a 1.4-SNAPSHOT > > when it is posted. > > > > Julian > > > > > > On Tue, Jul 21, 2015 at 11:18 AM, Jacques Nadeau <[email protected]> > > wrote: > > > I'm a little nervous about that for Drill. Despite the goal to do full > > > testing to make sure we weren't accidentally using field names > anywhere, > > we > > > haven't yet gotten very far. We know we're not supposed to depend on > > > anything but ordinal but as a name based system, it is likely something > > > depends on something there. > > > > > > On Tue, Jul 21, 2015 at 11:04 AM, Julian Hyde <[email protected]> > wrote: > > > > > >> ProjectMergeRule currently refuses to reduce identity projects if the > > >> fields have different names. > > >> > > >> For instance suppose you have a table Dept (deptno, name) and the > > algebra > > >> > > >> 2: Project($1 as X, $0 as Y) > > >> 1: Project($1, $0) > > >> 0: Scan(Dept) > > >> > > >> Observe that if you combine projects #1 and #2 you end up with > > >> > > >> 3: Project($0 as X, $1 as Y) > > >> 0: Scan(Dept) > > >> > > >> Although the new project (#3) is an identity, it renames the fields. > > >> ProjectMergeRule will return the new project (#3), but it could return > > >> Scan(Dept) (#0). > > >> > > >> Does anyone think they will break if I make it return #0? > > >> > > >> Julian > > >> > > >
