REGR_SYY do not work

Sergey Nuyanzin (JIRA) Tue, 10 Jul 2018 08:49:32 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538821#comment-16538821
 ]


Sergey Nuyanzin commented on CALCITE-2402:
------------------------------------------

Ok I see thank you

while working on it I faced with some issues/questions, could you please 
comment them?
# Are there any concerns to have SqlNodes (only in private) for expanding 
variance while in public there is also RexNode? Some more details related to 
the question. All not biased fail (denominator  {{count - 1}}) in boundary 
cases for window aggregated, e.g. {code}0: 
jdbc:calcite:model=target/test-classes/mod> select var_samp(age) over(partition 
by age) from emps;
+------------+
|   EXPR$0   |
+------------+
java.lang.ArithmeticException: / by zero
        at Baz$4$1.current(Unknown Source)
{code} It happens only while window aggregation because in regular aggregate 
there is CASE statement for that, i.e.
While trying to do the similar in case of window for the code {code}final 
SqlNumericLiteral one = SqlLiteral.createExactNumeric("1", pos);
        final SqlLiteral nullLiteral = SqlLiteral.createNull(SqlParserPos.ZERO);
        denominator = new SqlCase(SqlParserPos.ZERO,
            count,
            SqlNodeList.of(
                SqlStdOperatorTable.EQUALS.createCall(pos,
                    count, SqlLiteral.createExactNumeric("1", 
SqlParserPos.ZERO))
            ),
            SqlNodeList.of(
                getCastedSqlNode(nullLiteral, varType, pos, null)
            ),
            SqlStdOperatorTable.MINUS.createCall(pos, count, one));{code} it 
fails like {noformat}Caused by: java.lang.UnsupportedOperationException: class 
org.apache.calcite.sql.SqlLiteral: NULL
        at org.apache.calcite.util.Util.needToImplement(Util.java:921)
        at 
org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1551)
        at 
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertLiteral(SqlNodeToRexConverterImpl.java:93)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4659)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3977)
        at org.apache.calcite.sql.SqlLiteral.accept(SqlLiteral.java:532)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4541)
        at 
org.apache.calcite.sql2rel.StandardConvertletTable.convertCase(StandardConvertletTable.java:379)
        ... 37 more{noformat} At the same time if I use RexBuilder and RexNode 
instead of SqlNodes - everything works fine. 
 # For COVAR* and REGR* implementation I need a function to eliminate all rows 
where any of two arguments is null. Just double check if there is an existing 
function for that or no? Before I thought that {{SqlStdOperatorTable#FILTER}} 
could help however it fails that it does not have implementer.
# VARIANCE and VAR_SAMP are implemented as the same (in different rdbms they 
implemented a little different e.g. 
[Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/VARIANCE.html#GUID-EC33717A-2509-402D-B3BB-7EECB2E4ED8B],
 
[PostgreSQL|https://www.postgresql.org/docs/8.2/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE]).
 Double check if Calcite's goes PostegreSQL way of not?

> COVAR_POP/COVAR_SAMP/REGR_SXX/REGR_SYY  do not work
> ---------------------------------------------------
>
>                 Key: CALCITE-2402
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2402
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Sergey Nuyanzin
>            Assignee: Julian Hyde
>            Priority: Major
>
> Any query from {code:sql}
> select covar_samp(empno, deptno) from emps;
> select covar_pop(empno, deptno) from emps;
> select regr_sxx(empno, deptno) from emps;
> select regr_syy(empno, deptno) from emps;
> {code} 
> fails (the trace is below) 
> As I understand the reason is not fully implementation (did not find any 
> convertlet for them e.g.). 
> From my point of view I could fix this issue however I have a question: 
> How these function should be handled? As reducible functions (like 
> STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP) or as Rex operators in 
> RexImpTable? 
> Please give some advice here
> {noformat}0: jdbc:calcite:model=target/test-classes/mod> select 
> covar_samp(empno, deptno) from emps;
> Error: Error while executing SQL "select covar_samp(empno, deptno) from 
> emps": Node [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented; 
> planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], 
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 
> 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): 
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, 
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR 
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN 
> MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, 
> importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, 
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
>  rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, 
> importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $1)), rowcount=10.0, cumulative cost={inf}
>                 
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
>  rowcount=10.0, cumulative cost={inf} (state=,code=0)
> java.sql.SQLException: Error while executing SQL "select covar_samp(empno, 
> deptno) from emps": Node [rel#123:Subset#2.ENUMERABLE.[]] could not be 
> implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], 
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 
> 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): 
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, 
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR 
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN 
> MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, 
> importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, 
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
>  rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, 
> importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $1)), rowcount=10.0, cumulative cost={inf}
>                 
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
>  rowcount=10.0, cumulative cost={inf}
>         at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>         at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>         at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163)
>         at 
> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>         at sqlline.Commands.execute(Commands.java:823)
>         at sqlline.Commands.sql(Commands.java:733)
>         at sqlline.SqlLine.dispatch(SqlLine.java:795)
>         at sqlline.SqlLine.begin(SqlLine.java:668)
>         at sqlline.SqlLine.start(SqlLine.java:373)
>         at sqlline.SqlLine.main(SqlLine.java:265)
> Caused by: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node 
> [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], 
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 
> 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): 
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, 
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR 
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN 
> MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, 
> importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, 
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
>  rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, 
> importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $1)), rowcount=10.0, cumulative cost={inf}
>                 
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
>  rowcount=10.0, cumulative cost={inf}
>         at 
> org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448)
>         at 
> org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298)
>         at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
>         at org.apache.calcite.tools.Programs$5.run(Programs.java:326)
>         at 
> org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:387)
>         at org.apache.calcite.prepare.Prepare.optimize(Prepare.java:188)
>         at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:319)
>         at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:230)
>         at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:783)
>         at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:642)
>         at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:612)
>         at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:232)
>         at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:609)
>         at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>         at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>         ... 7 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2402) COVAR_POP/COVAR_SAMP/REGR_SXX/REGR_SYY do not work

Reply via email to