[ https://issues.apache.org/jira/browse/CALCITE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Igor Guzenko updated CALCITE-3138: ---------------------------------- Description: 1) RelStructuredTypeFlattener.restructureFields(structType) doesn't support ROW type. However, ROW type is flattened by RelStructuredTypeFlattener just like struct. So when user queries one column with complex type ROW, after flattening and restructuring top level project returns reference to first inner primitive field of the requested column. 2) Another finding is related to ITEM expression applied to array of structs column. For example, let's imagine a table with column of type ARRAY<STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>>. When user requests is SQL array_column[1], Calcite generates ITEM($0, 1) , where $0 is ref to array column from Scan and 1 is index literal. Current flattener generates two field acess expressions ITEM($0, 1).a, ITEM($0, 1).b but dont take into account that ITEM($0, 1).b returns struct which also should be flattened. 3) In some cases applying of ITEM after flattenning is no longer possible. For example, consider column with type STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>. User requests column['b'] in query and Calcite creates ITEM($0,'b'). After flattenning Scan is covered by LogicalProject($0.a, $0.b.x, $0.b.y) and the old projection ITEM($0,'b') can't be applied anymore. So now it should be converted to refer only subset of fields ($1,$2) from flattening project. UPDATES IN EXPECTED TEST RESULTS: ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testNestedColumnType() {code} select empa.home_address.zip from sales.emp_address empa where empa.home_address.city = 'abc' {code} OLD RESULT: {code} LogicalProject(ZIP=[$4]) LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))]) LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE]) LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code} 1. Above in logical filter, condition references to field $3 which is ZIP=[$1.ZIP] field from previous project, however in original query filtering should be done by CITY field. 2. Also the top level project references to $4 field, which is STATE=[$1.STATE] field from project, but original query requested ZIP field. UPDATED RESULT: {code} LogicalProject(ZIP=[$3]) LogicalFilter(condition=[=($2, 'abc')]) LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE]) LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code} ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testStructTypeAlias() {code} select t.r AS myRow from (select row(row(1)) r from dept) t {code} OLD RESULT: {code} LogicalProject(MYROW$$0$$0=[1]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} 1. Inside the subselect of row(row(1)) type of returned column is RecordType(RecordType(INTEGER EXPR$0) EXPR$0), but the top level project uses flattened expression and returns to user literal 1 with type RecordType(INTEGER MYROW$$0$$0), although the type doesn't match type returned by row(row(1)) expression. 2. Also it's suspicious that caller expects returned column to have name 'myRow' but gets 'MYROW$$0$$0'. UPDATED RESULT: {code} LogicalProject(MYROW=[ROW(ROW($0))]) LogicalProject(MYROW$$0$$0=[1]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testFlattenRecords() {code} select employees[1] from dept_nested {code} OLD RESULT: {code} LogicalProject(EXPR$0=[$0]) LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2=[ITEM($3, 1).DETAIL]) LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code} 1. Given selection of element by index from array, which by definition contain elements of type: {code} STRUCT<EMPNO:INT, ENAME:VARCHAR(10), DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10), DESC:VARCHAR(20), OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>> {code} First problem is that the intermediate project on top of the scan expected to return only flattened types. But last projection expression returns `DETAIL` field of array element which has type STRUCT and contain array field named 'SKILLS'. 2. The top level project returns incorrect flattened EMPNO:INT field from retrieved array element instead of whole array element of type struct. UPDATED RESULT: {code} LogicalProject(EXPR$0=[ROW($0, $1, ROW($2))]) LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS]) LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code} Other few updates in RelToSqlConverterStructsTest.java and PlannerTest.java are more explicit and visible in PR. was: ROW type is not supported to restructure fields after flattening (see RelStructuredTypeFlattener.restructureFields), which may cause wrong result. Attached patch file can be used to reproduce the issue. UPDATE: Some tests require changes in expected results. ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testNestedColumnType() ORIGIN RESULTS: select empa.home_address.zip from sales.emp_address empa where empa.home_address.city = 'abc' {code} LogicalProject(ZIP=[$4]) LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))]) LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE]) LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code} COMMENT: 1. Above in logical filter, condition references to field $3 which is ZIP=[$1.ZIP] field from previous project, however in original query filtering should be done by CITY field. 2. Also the top level project references to $4 field, which is STATE=[$1.STATE] field from project, but original query requested ZIP field. Updated expected result shows that both problems are fixed now. UPDATED RESULT: {code} LogicalProject(ZIP=[$3]) LogicalFilter(condition=[=($2, 'abc')]) LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], ZIP7=[$2.ZIP], STATE8=[$2.STATE]) LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code} ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testStructTypeAlias() ORIGIN RESULTS: select t.r AS myRow from (select row(row(1)) r from dept) t {code} LogicalProject(MYROW$$0$$0=[1]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} COMMENT: 1. Inside the subselect of row(row(1)) type of returned column is RecordType(RecordType(INTEGER EXPR$0) EXPR$0), but the top level project uses flattened expression and returns to user literal 1 with type RecordType(INTEGER MYROW$$0$$0), although the type doesn't match type returned by row(row(1)) expression. 2. Also it's suspicious that caller expects returned column to have name 'myRow' but gets 'MYROW$$0$$0'. Updated result collects back flattened expression using row constructor. UPDATED RESULT: {code} LogicalProject(MYROW=[ROW(ROW($0))]) LogicalProject(MYROW$$0$$0=[1]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} ---------------------------------------------------------------------------------- TEST CASE: SqlToRelConverterTest.testFlattenRecords() ORIGIN RESULTS: select employees[1] from dept_nested {code} LogicalProject(EXPR$0=[$0]) LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], EXPR$0$2=[ITEM($3, 1).DETAIL]) LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code} COMMENT: 1. Given selection of element by index from array, which by definition contain elements of type: {code} STRUCT<EMPNO:INT, ENAME:VARCHAR(10), DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10), DESC:VARCHAR(20), OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>> {code} First problem is that the intermediate project on top of the scan expected to return only flattened types. But last projection expression returns `DETAIL` field of array element which has type STRUCT and contain array field named 'SKILLS'. 2. The top level project returns incorrect flattened EMPNO:INT field from retrieved array element instead of whole array element of type struct. UPDATED RESULT: ELEMENT_TYPE = RecordType(INTEGER EMPNO, VARCHAR(10) ENAME, RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT NULL DESC, RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT NULL OTHERS) NOT NULL ARRAY SKILLS) DETAIL) SKILL_TYPE = RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT NULL DESC, RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT NULL OTHERS) NOT NULL ARRAY SKILLS) {code} LogicalProject(EXPR$0=[CASE($0, null:ELEMENT_TYPE, ROW($1, $2, CASE($3, null:SKILL_TYPE, ROW($4))))]) LogicalProject(IS_NULL_ITEM$6_1=[AS(IS NULL(ITEM($3, 1)), 'IS_NULL_ITEM$6_1')], EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], IS_NULL_ITEM$6_1DETAIL=[AS(IS NULL(ITEM($3, 1).DETAIL), 'IS_NULL_ITEM$6_1DETAIL')], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS]) LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code} > Restructuring ROW type fields is not supported > ---------------------------------------------- > > Key: CALCITE-3138 > URL: https://issues.apache.org/jira/browse/CALCITE-3138 > Project: Calcite > Issue Type: Bug > Components: core > Reporter: Haisheng Yuan > Assignee: Igor Guzenko > Priority: Major > Labels: pull-request-available > Attachments: ROW_repro.patch > > Time Spent: 20m > Remaining Estimate: 0h > > 1) RelStructuredTypeFlattener.restructureFields(structType) doesn't support > ROW type. However, ROW type is flattened by > RelStructuredTypeFlattener just like struct. So when user queries one column > with complex type ROW, after flattening and restructuring > top level project returns reference to first inner primitive field of the > requested column. > 2) Another finding is related to ITEM expression applied to array of structs > column. For example, let's imagine a table with column of > type ARRAY<STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>>. When user requests is SQL > array_column[1], Calcite generates ITEM($0, 1) , where $0 is ref to array > column > from Scan and 1 is index literal. Current flattener generates two field acess > expressions ITEM($0, 1).a, ITEM($0, 1).b but dont take into account > that ITEM($0, 1).b returns struct which also should be flattened. > 3) In some cases applying of ITEM after flattenning is no longer possible. > For example, consider column with type > STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>. User requests column['b'] in query and > Calcite creates ITEM($0,'b'). > After flattenning Scan is covered by LogicalProject($0.a, $0.b.x, $0.b.y) and > the old projection ITEM($0,'b') can't > be applied anymore. So now it should be converted to refer only subset of > fields ($1,$2) from flattening project. > UPDATES IN EXPECTED TEST RESULTS: > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testNestedColumnType() > {code} > select empa.home_address.zip from sales.emp_address empa where > empa.home_address.city = 'abc' > {code} > OLD RESULT: > {code} > LogicalProject(ZIP=[$4]) > LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))]) > LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], > ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], > ZIP7=[$2.ZIP], STATE8=[$2.STATE]) > LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) > {code} > 1. Above in logical filter, condition references to field $3 which is > ZIP=[$1.ZIP] field from previous project, > however in original query filtering should be done by CITY field. > 2. Also the top level project references to $4 field, which is > STATE=[$1.STATE] field from project, but original > query requested ZIP field. > > UPDATED RESULT: > {code} > LogicalProject(ZIP=[$3]) > LogicalFilter(condition=[=($2, 'abc')]) > LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], > ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], > ZIP7=[$2.ZIP], STATE8=[$2.STATE]) > LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) > {code} > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testStructTypeAlias() > {code} > select t.r AS myRow from (select row(row(1)) r from dept) t > {code} > OLD RESULT: > {code} > LogicalProject(MYROW$$0$$0=[1]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > 1. Inside the subselect of row(row(1)) type of returned column is > RecordType(RecordType(INTEGER EXPR$0) EXPR$0), > but the top level project uses flattened expression and returns to user > literal 1 with type RecordType(INTEGER MYROW$$0$$0), > although the type doesn't match type returned by row(row(1)) expression. > 2. Also it's suspicious that caller expects returned column to have name > 'myRow' but gets 'MYROW$$0$$0'. > > UPDATED RESULT: > {code} > LogicalProject(MYROW=[ROW(ROW($0))]) > LogicalProject(MYROW$$0$$0=[1]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > ---------------------------------------------------------------------------------- > TEST CASE: SqlToRelConverterTest.testFlattenRecords() > {code} > select employees[1] from dept_nested > {code} > OLD RESULT: > {code} > LogicalProject(EXPR$0=[$0]) > LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], > EXPR$0$2=[ITEM($3, 1).DETAIL]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) > {code} > 1. Given selection of element by index from array, which by definition > contain elements of type: > {code} > STRUCT<EMPNO:INT, > ENAME:VARCHAR(10), > DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10), > DESC:VARCHAR(20), > > OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>> > {code} > First problem is that the intermediate project on top of the scan expected to > return only flattened types. > But last projection expression returns `DETAIL` field of array element which > has type STRUCT and contain array > field named 'SKILLS'. > 2. The top level project returns incorrect flattened EMPNO:INT field from > retrieved array element instead > of whole array element of type struct. > > UPDATED RESULT: > {code} > LogicalProject(EXPR$0=[ROW($0, $1, ROW($2))]) > LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], > EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) > {code} > Other few updates in RelToSqlConverterStructsTest.java and PlannerTest.java > are more explicit and visible in PR. -- This message was sent by Atlassian JIRA (v7.6.14#76016)