HI Julian:
 thanks for your response, since I send the last mail from my phone ,I can
not describe the problem very well. sorry for that.

error sql:

select stddev_samp(cast(employee_id as int)) as col
,sum(cast(employee_id as int)) as col2 from cp.`employee.json`


error stack:

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
AssertionError: Type mismatch:
rel rowtype:
RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT NULL
equivRel rowtype:
RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL


[Error Id: 043be52f-dff2-4839-aa8b-fc1bf2c49698 on localhost:31010]
        at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
~[classes/:na]
        at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:788)
[classes/:na]


The generated sub functions and its data type varied through the aggregate
reduce phase :

initial  :                                 stddev_samp($0) double
                             ,            sum($0) bigint
                                                           |
                                                                  |
                                                         \ | /
                                                                 \|/
first rule match :        sum($1) integer ,sum($0) integer , count($0)
bigint          ,             sum0($0)  bigint
                                                            |
                                                                    |
                                                           \|/
                                                                  \|/
second rule mach:   sum0($1) integer, sum0($0) integer, count($0) bigint
    ,              sum0($0)  bigint

As you can see from the last graph, there's two same function sum0($0) ,but
different data types, The RexBuildler.addAggCall method now does consider
the AggregateCall's data type with the existing ones in the AggregateCall
--> RexNode map. So drill will mistake the wrong RexInputRef.

There's no doubt that RexNode is strict regarding types. I have also
examined Calcite's AggregateReduceFunctionsRule.It seems  Calcite will not
occur this error through my experiment. The reason is Drill has type
inference effect while Calcite does not. The initial input field $0 is
integer , to Drill ,the sum(cast (employee_id as int)) is bigint , the
reduced sum0($0) is bigint through the inference which takes the sum($0)'s
data type as its data type , to Calcite , the corresponding sum0($0) will
be integer as its data type is taken from the Aggregate's input field's
data type which is integer.

Though it seems Drill's type inference effect caused this error. But as a
common util method , it's better to refactor the implementation to keep the
interface promise ,not dependent on outside's rule.


The refactor content maybe like below:

public RexNode addAggCall(AggregateCall aggCall, int groupCount,
    boolean indicator, List<AggregateCall> aggCalls,
    Map<AggregateCall, RexNode> aggCallMapping,
    final List<RelDataType> aggArgTypes) {
  if (aggCall.getAggregation() instanceof SqlCountAggFunction
      && !aggCall.isDistinct()) {
    final List<Integer> args = aggCall.getArgList();
    final List<Integer> nullableArgs = nullableArgs(args, aggArgTypes);
    if (!nullableArgs.equals(args)) {
      aggCall = aggCall.copy(nullableArgs, aggCall.filterArg);
    }
  }

  for (AggregateCall aggregateCall : aggCallMapping.keySet()) {
    if (aggregateCall.equals(aggCall) &&
!aggregateCall.getType().equals(aggCall.getType())) {
      int index = aggCalls.size() + groupCount * (indicator ? 2 : 1);
      aggCalls.add(aggCall);
      RexNode rex = makeInputRef(aggCall.getType(), index);
      return rex;
    }
  }

  RexNode rex = aggCallMapping.get(aggCall);
  if (rex == null) {
    int index = aggCalls.size() + groupCount * (indicator ? 2 : 1);
    aggCalls.add(aggCall);
    rex = makeInputRef(aggCall.getType(), index);
    aggCallMapping.put(aggCall, rex);
  }
  return rex;
}






On Tue, Oct 31, 2017 at 1:06 AM, Julian Hyde <[email protected]> wrote:

> Maybe I don’t fully understand the problem, but a RexInputRef must always
> have the same type as the field it references. If field 1 is an INTEGER,
> then RefInputRef(1) must be an INTEGER. If one consumer wants that field as
> a BIGINT then they must apply a cast (RexCall to the CAST operator).
>
> If you are creating a RexInputRef with the type that you want it to be,
> then bad things will happen.
>
> Sorry if this is confusing/unexpected. RexNode is very strict regarding
> types.
>
> Julian
>
>
> > On Oct 30, 2017, at 8:41 AM, weijie tong <[email protected]>
> wrote:
> >
> > Hi devs:
> >  At drill,stddev_pop and sum functions once issued together will occur
> > error.The reason is the DrillReduceAggregatesRule leverages Calcite's
> > RexBuilder.addAggCall method to reduce the duplicated functions.But this
> > method only cares about function name and input references ignoring the
> > parameter data type.
> > More details please see https://github.com/apache/drill/pull/1016
> >
> > But I think it's better  to move my judgement logic codes to the
> > RexBuilder's addAggCall method.
> > So what do you think about it ? If it sounds ok ,I will be pleasure to do
> > that.
>
>

Reply via email to