On Fri, Oct 14, 2016 at 5:16 PM, Julian Hyde <jh...@apache.org> wrote:

> Note the important expression:
>
>   convert(sort.getInput(), traits)
>
> This creates a new relational expression (or returns an existing one) that
> is “input" sorted by the required fields. It doesn’t change “input”.
>
>
I dug into getOrCreateSubset/addAbstractConverters but I still don't see
how this could work.


> By the way, the Volcano paper talks about physical properties (what we
> call traits) and enforcers. An enforcer is a relational operator that
> doesn’t change the logical content of the data, just changes its physical
> properties.
>
>
Isn't the sort node acting as the enforcer here? I don't see a reason to
remove it and then add it back via convert(). I'm specifically looking at
the case where the sort is necessary (i.e. should not be removed by this
optimization).


> Every physical property has a corresponding enforcer: Sort is the enforcer
> of the Collation trait; Shuffle is the enforcer of the Distribution trait;
> Convert is the enforcer of the CallingConvention trait.
>
> The short piece of code you quoted may seem almost trivial, but it is
> stating something profound: the relationship between Sort and Collation.
>
> The trick, of course, is to find a way of achieving a sort order without
> using a sort (e.g. finding a copy of the table that contains the columns
> you want and is sorted in the order you want, i.e. an index). The whole
> reason we have the concept of physical properties is to make that kind of
> optimization (sometimes) possible.
>
>
I created a simple test:

  @Test public void testTwoSortDontRemove() throws Exception {
    runDuplicateSortCheck("select empid+deptno from ( "
                          + "select empid, deptno "
                          + "from emps "
                          + "order by empid) "
                          + "order by deptno",
                          "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
                          + "  EnumerableProject(EXPR$0=[+($0, $1)],
deptno=[$1])\n"
                          + "    EnumerableSort(sort0=[$0], dir0=[ASC])\n"
                          + "      EnumerableProject(empid=[$0],
deptno=[$1])\n"
                          + "        EnumerableTableScan(table=[[hr,
emps]])\n");
  }

When I run this, I get an error because the sort is removed:

EnumerableProject(EXPR$0=[+($0, $1)], deptno=[$1])
  EnumerableSort(sort0=[$0], dir0=[ASC])
    EnumerableProject(empid=[$0], deptno=[$1])
      EnumerableTableScan(table=[[hr, emps]])

I changed the original method in question to only remove the sort if the
requested collation matches that of it's input:

    if
(sort.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE).equals(collation))
{
      final RelTraitSet traits =
sort.getInput().getTraitSet().replace(collation);
      call.transformTo(convert(sort.getInput(), traits));
    }

This will cause the test to pass. Isn't this a bug?

Jess


> Julian
>
>
> > On Oct 14, 2016, at 2:33 PM, Jess Balint <jbal...@gmail.com> wrote:
> >
> > Just doing a bit of code reading here. Looking at the SortRemoveRule
> > implementation:
> >
> >    // Express the "sortedness" requirement in terms of a collation trait
> > and
> >    // we can get rid of the sort. This allows us to use rels that just
> > happen
> >    // to be sorted but get the same effect.
> >    final RelCollation collation = sort.getCollation();
> >    assert collation == sort.getTraitSet()
> >        .getTrait(RelCollationTraitDef.INSTANCE);
> >    final RelTraitSet traits =
> > sort.getInput().getTraitSet().replace(collation);
> >    call.transformTo(convert(sort.getInput(), traits));
> >
> > I don't understand how this is correct without checking that the input
> > collation is the same as the collation requested by the sort. It looks to
> > me like it just replaces the input's collation with that of the sort. Any
> > thoughts?
> >
> > Thanks.
> >
> > Jess
>
>

Reply via email to