[ https://issues.apache.org/jira/browse/CALCITE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Igor Guzenko updated CALCITE-3138: ---------------------------------- Description: ROW type is not supported to restructure fields after flattening (see RelStructuredTypeFlattener.restructureFields), which may cause wrong result. Attached patch file can be used to reproduce the issue. UPDATE: Some tests require changes in expected results. ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testNestedColumnType() ORIGIN RESULTS: select empa.home_address.zip from sales.emp_address empa where empa.home_address.city = 'abc' {code} LogicalProject(ZIP=[$4]) LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))]) LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE]) LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code} COMMENT: 1. Above in logical filter, condition references to field $3 which is ZIP=[$1.ZIP] field from previous project, however in original query filtering should be done by CITY field. 2. Also the top level project references to $4 field, which is STATE=[$1.STATE] field from project, but original query requested ZIP field. Updated expected result shows that both problems are fixed now. UPDATED RESULT: {code} LogicalProject(ZIP=[$3]) LogicalFilter(condition=[=($2, 'abc')]) LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE]) LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code} ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testStructTypeAlias() ORIGIN RESULTS: select t.r AS myRow from (select row(row(1)) r from dept) t {code} LogicalProject(MYROW$$0$$0=[1]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} COMMENT: 1. Inside the subselect of row(row(1)) type of returned column is RecordType(RecordType(INTEGER EXPR$0) EXPR$0), but the top level project uses flattened expression and returns to user literal 1 with type RecordType(INTEGER MYROW$$0$$0), although the type doesn't match type returned by row(row(1)) expression. 2. Also it's suspicious that caller expects returned column to have name 'myRow' but gets 'MYROW$$0$$0'. Updated result collects back flattened expression using row constructor. UPDATED RESULT: {code} LogicalProject(MYROW=[ROW(ROW($0))]) LogicalProject(MYROW$$0$$0=[1]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testFlattenRecords() ORIGIN RESULTS: select employees[1] from dept_nested {code} LogicalProject(EXPR$0=[$0]) LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2=[ITEM($3, 1).DETAIL]) LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code} COMMENT: 1. Given selection of element by index from array, which by definition contain elements of type: {code} STRUCT<EMPNO:INT, ENAME:VARCHAR(10), DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10), DESC:VARCHAR(20), OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>> {code} First problem is that the intermediate project on top of the scan expected to return only flattened types. But last projection expression returns `DETAIL` field of array element which has type STRUCT and contain array field named 'SKILLS'. 2. The top level project returns incorrect flattened EMPNO:INT field from retrieved array element instead of whole array element of type struct. UPDATED RESULT: ELEMENT_TYPE = RecordType(INTEGER EMPNO, VARCHAR(10) ENAME, RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT NULL DESC, RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT NULL OTHERS) NOT NULL ARRAY SKILLS) DETAIL) SKILL_TYPE = RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT NULL DESC, RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT NULL OTHERS) NOT NULL ARRAY SKILLS) {code} LogicalProject(EXPR$0=[CASE($0, null:ELEMENT_TYPE, ROW($1, $2, CASE($3, null:SKILL_TYPE, ROW($4))))]) LogicalProject(IS_NULL_ITEM$6_1=[AS(IS NULL(ITEM($3, 1)), 'IS_NULL_ITEM$6_1')], EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], IS_NULL_ITEM$6_1DETAIL=[AS(IS NULL(ITEM($3, 1).DETAIL), 'IS_NULL_ITEM$6_1DETAIL')], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS]) LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code} was: ROW type is not supported to restructure fields after flattening (see RelStructuredTypeFlattener.restructureFields), which may cause wrong result. Attached patch file can be used to reproduce the issue. > Restructuring ROW type fields is not supported > ---------------------------------------------- > > Key: CALCITE-3138 > URL: https://issues.apache.org/jira/browse/CALCITE-3138 > Project: Calcite > Issue Type: Bug > Components: core > Reporter: Haisheng Yuan > Assignee: Igor Guzenko > Priority: Major > Attachments: ROW_repro.patch > > > ROW type is not supported to restructure fields after flattening (see > RelStructuredTypeFlattener.restructureFields), which may cause wrong result. > Attached patch file can be used to reproduce the issue. > UPDATE: Some tests require changes in expected results. > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testNestedColumnType() > ORIGIN RESULTS: > select empa.home_address.zip from sales.emp_address empa where > empa.home_address.city = 'abc' > {code} > LogicalProject(ZIP=[$4]) > LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))]) > LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], > ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], > ZIP7=[$2.ZIP], STATE8=[$2.STATE]) > LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) > {code} > COMMENT: > 1. Above in logical filter, condition references to field $3 which is > ZIP=[$1.ZIP] field from previous project, > however in original query filtering should be done by CITY field. > 2. Also the top level project references to $4 field, which is > STATE=[$1.STATE] field from project, but original > query requested ZIP field. > > Updated expected result shows that both problems are fixed now. > UPDATED RESULT: > {code} > LogicalProject(ZIP=[$3]) > LogicalFilter(condition=[=($2, 'abc')]) > LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], > ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], > ZIP7=[$2.ZIP], STATE8=[$2.STATE]) > LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) > {code} > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testStructTypeAlias() > ORIGIN RESULTS: > select t.r AS myRow from (select row(row(1)) r from dept) t > {code} > LogicalProject(MYROW$$0$$0=[1]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > COMMENT: > 1. Inside the subselect of row(row(1)) type of returned column is > RecordType(RecordType(INTEGER EXPR$0) EXPR$0), > but the top level project uses flattened expression and returns to user > literal 1 with type RecordType(INTEGER MYROW$$0$$0), > although the type doesn't match type returned by row(row(1)) expression. > 2. Also it's suspicious that caller expects returned column to have name > 'myRow' but gets 'MYROW$$0$$0'. > Updated result collects back flattened expression using row constructor. > > UPDATED RESULT: > {code} > LogicalProject(MYROW=[ROW(ROW($0))]) > LogicalProject(MYROW$$0$$0=[1]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testFlattenRecords() > ORIGIN RESULTS: > select employees[1] from dept_nested > {code} > LogicalProject(EXPR$0=[$0]) > LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], > EXPR$0$2=[ITEM($3, 1).DETAIL]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) > {code} > COMMENT: > 1. Given selection of element by index from array, which by definition > contain elements of type: > {code} > STRUCT<EMPNO:INT, > ENAME:VARCHAR(10), > DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10), > DESC:VARCHAR(20), > > OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>> > {code} > First problem is that the intermediate project on top of the scan expected to > return only flattened types. > But last projection expression returns `DETAIL` field of array element which > has type STRUCT and contain array > field named 'SKILLS'. > 2. The top level project returns incorrect flattened EMPNO:INT field from > retrieved array element instead > of whole array element of type struct. > > UPDATED RESULT: > ELEMENT_TYPE = RecordType(INTEGER EMPNO, VARCHAR(10) ENAME, > RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT NULL DESC, > RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT > NULL OTHERS) NOT NULL ARRAY SKILLS) DETAIL) > SKILL_TYPE = RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT > NULL DESC, RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT > NULL OTHERS) NOT NULL ARRAY SKILLS) > {code} > LogicalProject(EXPR$0=[CASE($0, null:ELEMENT_TYPE, ROW($1, $2, CASE($3, > null:SKILL_TYPE, ROW($4))))]) > LogicalProject(IS_NULL_ITEM$6_1=[AS(IS NULL(ITEM($3, 1)), > 'IS_NULL_ITEM$6_1')], EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, > 1).ENAME], IS_NULL_ITEM$6_1DETAIL=[AS(IS NULL(ITEM($3, 1).DETAIL), > 'IS_NULL_ITEM$6_1DETAIL')], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)