Thanks for the feedback.
The SUBSTR function is not a SQL standard function, so it requires some
specific configuration (specifically the "fun" parameter).
Please check
https://calcite.apache.org/docs/reference.html#dialect-specific-operators
for more info (and notice how SUBSTR is listed in the table in that
section).
The alternative would be modifying the queries to use the standard
SUBSTRING function.
I am trying to reproduce the partition query problem within Calcite, but so
far I have not succeeded.
I tried adding your query in a test in TpcdsTest.java [1]. SqlToRel
conversion seems to run fine (then the query fails at execution time with
"CalciteException: Cannot convert null to long", but that seems a different
problem than yours).
Could you please try to reproduce the exception with a unit test in Calcite?
Best,
Ruben
[1]
class TpcdsTest {
...
@Test void testYXZ() {
with()
.query("select i_item_id ...")
.runs();
}
...
}
On Fri, Feb 19, 2021 at 7:13 PM Priyendra Deshwal <[email protected]>
wrote:
> Thanks for confirming the decorrelation bugfix.
>
> Here's the info for the SUBSTR error. As noted, this query fails in
> SqlValidator. I have verified that this fails in both 1.25 and 1.26. Have
> not been able to test master yet.
>
> select ca_zip
> ,sum(cs_sales_price)
> from catalog_sales
> ,customer
> ,customer_address
> ,date_dim
> where cs_bill_customer_sk = c_customer_sk
> and c_current_addr_sk = ca_address_sk
> and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
> '85392', '85460', '80348', '81792')
> or ca_state in ('CA','WA','GA')
> or cs_sales_price > 500)
> and cs_sold_date_sk = d_date_sk
> and d_qoy = 2 and d_year = 2001
> group by ca_zip
> order by ca_zip
> limit 100
>
> Exception in thread "main"
> org.apache.calcite.runtime.CalciteContextException: From line 9, column 9
> to line 9, column 26: No match found for function signature
> SUBSTR(<CHARACTER>, <NUMERIC>, <NUMERIC>)
> at
>
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
>
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
>
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at
> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
> at
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:467)
> at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:867)
> at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:852)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:5003)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.handleUnresolvedFunction(SqlValidatorImpl.java:1835)
> at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:313)
> at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:219)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5766)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5753)
> at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1753)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1738)
> at
> org.apache.calcite.sql.type.InferTypes.lambda$static$0(InferTypes.java:47)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:1933)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:1941)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:1941)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:1941)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:1941)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:1941)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes(SqlValidatorImpl.java:1941)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateWhereOrOn(SqlValidatorImpl.java:4093)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateWhereClause(SqlValidatorImpl.java:4085)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3424)
> at
>
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
> at
>
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1064)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:1038)
> at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:232)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1013)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:721)
> at
>
> io.netspring.playground.calcite.CalcitePoc.processQuery(CalcitePoc.java:621)
> at io.netspring.playground.calcite.CalcitePoc.main(CalcitePoc.java:705)
> Caused by: org.apache.calcite.sql.validate.SqlValidatorException: No match
> found for function signature SUBSTR(<CHARACTER>, <NUMERIC>, <NUMERIC>)
> at
>
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
>
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
>
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at
> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
> at
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:467)
> at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:560)
> ... 31 more
>
> Here's the info for the partition over query. As noted, this query fails in
> SqlToRelConverter. I have verified that this query fails in both 1.25 and
> 1.26. Have not been able to test master yet.
>
> select i_item_id
> ,i_item_desc
> ,i_category
> ,i_class
> ,i_current_price
> ,sum(ws_ext_sales_price) as itemrevenue
> ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over
> (partition by i_class) as revenueratio
> from
> web_sales
> ,item
> ,date_dim
> where
> ws_item_sk = i_item_sk
> and i_category in ('Sports', 'Books', 'Home')
> and ws_sold_date_sk = d_date_sk
> and d_date between cast('1999-02-22' as date)
> and (cast('1999-02-22' as date) + INTERVAL '30' day)
> group by
> i_item_id
> ,i_item_desc
> ,i_category
> ,i_class
> ,i_current_price
> order by
> i_category
> ,i_class
> ,i_item_id
> ,i_item_desc
> ,revenueratio
> limit 100
>
> Exception in thread "main" java.lang.RuntimeException: while converting
> SUM(`WS_EXT_SALES_PRICE`) * 100 / (SUM(SUM(`WS_EXT_SALES_PRICE`)) OVER
> (PARTITION BY `I_CLASS`))
> at
>
> org.apache.calcite.sql2rel.ReflectiveConvertletTable.lambda$registerNodeTypeMethod$0(ReflectiveConvertletTable.java:86)
> at
>
> org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:62)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:5001)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4297)
> at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4864)
> at
>
> org.apache.calcite.sql2rel.StandardConvertletTable.lambda$new$9(StandardConvertletTable.java:207)
> at
>
> org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:62)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:5001)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4297)
> at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:139)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4864)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:3052)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2896)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:674)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:636)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3380)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:562)
> at
>
> io.netspring.playground.calcite.CalcitePoc.processQuery(CalcitePoc.java:643)
> at io.netspring.playground.calcite.CalcitePoc.main(CalcitePoc.java:705)
> Caused by: java.lang.reflect.InvocationTargetException
> at jdk.internal.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> at
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
>
> org.apache.calcite.sql2rel.ReflectiveConvertletTable.lambda$registerNodeTypeMethod$0(ReflectiveConvertletTable.java:83)
> ... 19 more
> Caused by: java.lang.UnsupportedOperationException: class
> org.apache.calcite.sql.SqlBasicCall: SUM(SUM(`WS_EXT_SALES_PRICE`)) OVER
> (PARTITION BY `I_CLASS`)
> at org.apache.calcite.util.Util.needToImplement(Util.java:975)
> at
>
> org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1655)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:2005)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter.access$1900(SqlToRelConverter.java:219)
> at
>
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4857)
> at
>
> org.apache.calcite.sql2rel.StandardConvertletTable.convertExpressionList(StandardConvertletTable.java:839)
> at
>
> org.apache.calcite.sql2rel.StandardConvertletTable.convertCall(StandardConvertletTable.java:815)
> at
>
> org.apache.calcite.sql2rel.StandardConvertletTable.convertCall(StandardConvertletTable.java:802)
> ... 23 more
>
> On Fri, Feb 19, 2021 at 10:28 AM Ruben Q L <[email protected]> wrote:
>
> > Hello Priyendra,
> >
> > Indeed, your decorrelator issue (4) seems to be solved in master thanks
> to
> > https://issues.apache.org/jira/browse/CALCITE-4333
> >
> > Could you please provide the query and the stack-trace from substr issue
> > (2) ?
> >
> > What is the actual query that originates partition issue (3)?
> >
> > Best,
> > Ruben
> >
> >
> > On Fri, Feb 19, 2021 at 5:40 PM Priyendra Deshwal <[email protected]>
> > wrote:
> >
> > > FYI - I was running my tests on the v1.26 release. When I ran the same
> > test
> > > on master, it seems the query does get decorrelated. Have there been
> some
> > > recent bugs or bugfixes in that area?
> > >
> > > with customer_total_return as
> > > (select sr_customer_sk as ctr_customer_sk
> > > ,sr_store_sk as ctr_store_sk
> > > ,sum(sr_return_amt) as ctr_total_return
> > > from store_returns
> > > ,date_dim
> > > where sr_returned_date_sk = d_date_sk
> > > and d_year = 2000
> > > group by sr_customer_sk
> > > ,sr_store_sk)
> > > select c_customer_id
> > > from customer_total_return ctr1
> > > ,store
> > > ,customer
> > > where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
> > > from customer_total_return ctr2
> > > where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
> > > and s_store_sk = ctr1.ctr_store_sk
> > > and s_state = 'TN'
> > > and ctr1.ctr_customer_sk = c_customer_sk
> > > order by c_customer_id
> > > limit 100
> > >
> > >
> > > *[Initial Logical Plan]*LogicalSort(sort0=[$0], dir0=[ASC],
> fetch=[100])
> > > LogicalProject(C_CUSTOMER_ID=[$33])
> > > LogicalFilter(condition=[AND(>($2, $50), =($3, $1), =($27, 'TN'),
> > =($0,
> > > $32))])
> > > LogicalCorrelate(correlation=[$cor0], joinType=[left],
> > > requiredColumns=[{1}])
> > > LogicalJoin(condition=[true], joinType=[inner])
> > > LogicalJoin(condition=[true], joinType=[inner])
> > > LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1],
> > > CTR_TOTAL_RETURN=[$2])
> > > LogicalAggregate(group=[{0, 1}],
> > CTR_TOTAL_RETURN=[SUM($2)])
> > > LogicalProject(sr_customer_sk=[$3], sr_store_sk=[$7],
> > > sr_return_amt=[$11])
> > > LogicalFilter(condition=[AND(=($0, $20), =($26,
> > 2000))])
> > > LogicalJoin(condition=[true], joinType=[inner])
> > > LogicalTableScan(table=[[store_returns]])
> > > LogicalTableScan(table=[[date_dim]])
> > > LogicalTableScan(table=[[store]])
> > > LogicalTableScan(table=[[customer]])
> > > LogicalProject(EXPR$0=[*($0, 1.2:DECIMAL(2, 1))])
> > > LogicalAggregate(group=[{}], agg#0=[AVG($0)])
> > > LogicalProject(CTR_TOTAL_RETURN=[$2])
> > > LogicalFilter(condition=[=($cor0.CTR_STORE_SK, $1)])
> > > LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1],
> > > CTR_TOTAL_RETURN=[$2])
> > > LogicalAggregate(group=[{0, 1}],
> > > CTR_TOTAL_RETURN=[SUM($2)])
> > > LogicalProject(sr_customer_sk=[$3],
> sr_store_sk=[$7],
> > > sr_return_amt=[$11])
> > > LogicalFilter(condition=[AND(=($0, $20), =($26,
> > > 2000))])
> > > LogicalJoin(condition=[true], joinType=[inner])
> > > LogicalTableScan(table=[[store_returns]])
> > > LogicalTableScan(table=[[date_dim]])
> > >
> > >
> > > *[Decorrelated Logical Plan]*LogicalSort(sort0=[$0], dir0=[ASC],
> > > fetch=[100])
> > > LogicalProject(C_CUSTOMER_ID=[$33])
> > > LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1],
> > > CTR_TOTAL_RETURN=[$2], s_store_sk=[$3], s_store_id=[$4],
> > > s_rec_start_date=[$5], s_rec_end_date=[$6], s_closed_date_sk=[$7],
> > > s_store_name=[$8], s_number_employees=[$9], s_floor_space=[$10],
> > > s_hours=[$11], s_manager=[$12], s_market_id=[$13],
> > s_geography_class=[$14],
> > > s_market_desc=[$15], s_market_manager=[$16], s_division_id=[$17],
> > > s_division_name=[$18], s_company_id=[$19], s_company_name=[$20],
> > > s_street_number=[$21], s_street_name=[$22], s_street_type=[$23],
> > > s_suite_number=[$24], s_city=[$25], s_county=[$26], s_state=[$27],
> > > s_zip=[$28], s_country=[$29], s_gmt_offset=[$30],
> s_tax_precentage=[$31],
> > > c_customer_sk=[$32], c_customer_id=[$33], c_current_cdemo_sk=[$34],
> > > c_current_hdemo_sk=[$35], c_current_addr_sk=[$36],
> > > c_first_shipto_date_sk=[$37], c_first_sales_date_sk=[$38],
> > > c_salutation=[$39], c_first_name=[$40], c_last_name=[$41],
> > > c_preferred_cust_flag=[$42], c_birth_day=[$43], c_birth_month=[$44],
> > > c_birth_year=[$45], c_birth_country=[$46], c_login=[$47],
> > > c_email_address=[$48], c_last_review_date_sk=[$49],
> > > CTR_STORE_SK0=[CAST($50):BIGINT], $f1=[CAST($51):DOUBLE])
> > > LogicalJoin(condition=[AND(=($1, $50), >($2, *($51,
> 1.2:DECIMAL(2,
> > > 1))))], joinType=[inner])
> > > LogicalJoin(condition=[=($0, $32)], joinType=[inner])
> > > LogicalJoin(condition=[=($3, $1)], joinType=[inner])
> > > LogicalProject(CTR_CUSTOMER_SK=[$0], CTR_STORE_SK=[$1],
> > > CTR_TOTAL_RETURN=[$2])
> > > LogicalAggregate(group=[{0, 1}],
> > CTR_TOTAL_RETURN=[SUM($2)])
> > > LogicalProject(sr_customer_sk=[$3], sr_store_sk=[$7],
> > > sr_return_amt=[$11])
> > > LogicalJoin(condition=[=($0, $20)], joinType=[inner])
> > > LogicalTableScan(table=[[store_returns]])
> > > LogicalFilter(condition=[=($6, 2000)])
> > > LogicalTableScan(table=[[date_dim]])
> > > LogicalFilter(condition=[=($24, 'TN')])
> > > LogicalTableScan(table=[[store]])
> > > LogicalTableScan(table=[[customer]])
> > > LogicalAggregate(group=[{0}], agg#0=[AVG($1)])
> > > LogicalProject(CTR_STORE_SK=[$1], CTR_TOTAL_RETURN=[$2])
> > > LogicalAggregate(group=[{0, 1}],
> CTR_TOTAL_RETURN=[SUM($2)])
> > > LogicalProject(sr_customer_sk=[$3], sr_store_sk=[$7],
> > > sr_return_amt=[$11])
> > > LogicalJoin(condition=[=($0, $20)], joinType=[inner])
> > > LogicalTableScan(table=[[store_returns]])
> > > LogicalFilter(condition=[=($6, 2000)])
> > > LogicalTableScan(table=[[date_dim]])
> > >
> >
>