[jira] [Updated] (CALCITE-3138) Restructuring ROW type fields is not supported

Igor Guzenko (JIRA) Fri, 05 Jul 2019 04:33:39 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Igor Guzenko updated CALCITE-3138:
----------------------------------
    Description: 
ROW type is not supported to restructure fields after flattening (see 
RelStructuredTypeFlattener.restructureFields), which may cause wrong result.

Attached patch file can be used to reproduce the issue.

UPDATE: Some tests require changes in expected results. 

----------------------------------------------------------------------------------
TEST CASE: SqlToRelConverterTest.testNestedColumnType()

ORIGIN RESULTS: 

select empa.home_address.zip from sales.emp_address empa where 
empa.home_address.city = 'abc'

{code}
LogicalProject(ZIP=[$4])
  LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))])
    LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
ZIP7=[$2.ZIP], STATE8=[$2.STATE])
      LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
{code}

COMMENT: 
1. Above in logical filter, condition references to field $3 which is 
ZIP=[$1.ZIP] field from previous project, 
   however in original query filtering should be done by CITY field. 
2. Also the top level project references to $4 field, which is STATE=[$1.STATE] 
field from project, but original 
   query requested ZIP field.
 
Updated expected result shows that both problems are fixed now.  

UPDATED RESULT:
{code}
LogicalProject(ZIP=[$3])
  LogicalFilter(condition=[=($2, 'abc')])
    LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
ZIP7=[$2.ZIP], STATE8=[$2.STATE])
      LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
{code}
----------------------------------------------------------------------------------

TEST CASE: SqlToRelConverterTest.testStructTypeAlias()

ORIGIN RESULTS:

select t.r AS myRow from (select row(row(1)) r from dept) t  
{code}
LogicalProject(MYROW$$0$$0=[1])
  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}
COMMENT:
1. Inside the subselect of row(row(1)) type of returned column is 
RecordType(RecordType(INTEGER EXPR$0) EXPR$0),
   but the top level project uses flattened expression and returns to user 
literal 1 with type RecordType(INTEGER MYROW$$0$$0), 
   although the type doesn't match type returned by row(row(1)) expression. 
2. Also it's suspicious that caller expects returned column to have name 
'myRow' but gets 'MYROW$$0$$0'. 

Updated result collects back flattened expression using row constructor.
  
UPDATED RESULT:
{code}
LogicalProject(MYROW=[ROW(ROW($0))])
  LogicalProject(MYROW$$0$$0=[1])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

----------------------------------------------------------------------------------

TEST CASE: SqlToRelConverterTest.testFlattenRecords()

ORIGIN RESULTS:

select employees[1] from dept_nested 
{code}
LogicalProject(EXPR$0=[$0])
  LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], 
EXPR$0$2=[ITEM($3, 1).DETAIL])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
{code}
COMMENT:
1. Given selection of element by index from array, which by definition contain 
elements of type: 
{code}
STRUCT<EMPNO:INT,
       ENAME:VARCHAR(10),
       DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10),
                                         DESC:VARCHAR(20),
                                         
OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>>
{code} 

First problem is that the intermediate project on top of the scan expected to 
return only flattened types. 
But last projection expression returns `DETAIL` field of array element which 
has type STRUCT and contain array 
field named 'SKILLS'. 

2. The top level project returns incorrect flattened EMPNO:INT field from 
retrieved array element instead 
of whole array element of type struct. 

  
UPDATED RESULT:

ELEMENT_TYPE = RecordType(INTEGER EMPNO, VARCHAR(10) ENAME, 
RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT NULL DESC, 
               RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT 
NULL OTHERS) NOT NULL ARRAY SKILLS) DETAIL)
SKILL_TYPE = RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT 
NULL DESC, RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT NULL 
OTHERS) NOT NULL ARRAY SKILLS)
{code}
LogicalProject(EXPR$0=[CASE($0, null:ELEMENT_TYPE, ROW($1, $2, CASE($3, 
null:SKILL_TYPE, ROW($4))))])
  LogicalProject(IS_NULL_ITEM$6_1=[AS(IS NULL(ITEM($3, 1)), 
'IS_NULL_ITEM$6_1')], EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 
1).ENAME], IS_NULL_ITEM$6_1DETAIL=[AS(IS NULL(ITEM($3, 1).DETAIL), 
'IS_NULL_ITEM$6_1DETAIL')], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
{code}
 

  was:
ROW type is not supported to restructure fields after flattening (see 
RelStructuredTypeFlattener.restructureFields), which may cause wrong result. 

Attached patch file can be used to reproduce the issue.


> Restructuring ROW type fields is not supported
> ----------------------------------------------
>
>                 Key: CALCITE-3138
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3138
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Haisheng Yuan
>            Assignee: Igor Guzenko
>            Priority: Major
>         Attachments: ROW_repro.patch
>
>
> ROW type is not supported to restructure fields after flattening (see 
> RelStructuredTypeFlattener.restructureFields), which may cause wrong result.
> Attached patch file can be used to reproduce the issue.
> UPDATE: Some tests require changes in expected results. 
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testNestedColumnType()
> ORIGIN RESULTS: 
> select empa.home_address.zip from sales.emp_address empa where 
> empa.home_address.city = 'abc'
> {code}
> LogicalProject(ZIP=[$4])
>   LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))])
>     LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
> ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
> ZIP7=[$2.ZIP], STATE8=[$2.STATE])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> {code}
> COMMENT: 
> 1. Above in logical filter, condition references to field $3 which is 
> ZIP=[$1.ZIP] field from previous project, 
>    however in original query filtering should be done by CITY field. 
> 2. Also the top level project references to $4 field, which is 
> STATE=[$1.STATE] field from project, but original 
>    query requested ZIP field.
>  
> Updated expected result shows that both problems are fixed now.  
> UPDATED RESULT:
> {code}
> LogicalProject(ZIP=[$3])
>   LogicalFilter(condition=[=($2, 'abc')])
>     LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
> ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
> ZIP7=[$2.ZIP], STATE8=[$2.STATE])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> {code}
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testStructTypeAlias()
> ORIGIN RESULTS:
> select t.r AS myRow from (select row(row(1)) r from dept) t  
> {code}
> LogicalProject(MYROW$$0$$0=[1])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> COMMENT:
> 1. Inside the subselect of row(row(1)) type of returned column is 
> RecordType(RecordType(INTEGER EXPR$0) EXPR$0),
>    but the top level project uses flattened expression and returns to user 
> literal 1 with type RecordType(INTEGER MYROW$$0$$0), 
>    although the type doesn't match type returned by row(row(1)) expression. 
> 2. Also it's suspicious that caller expects returned column to have name 
> 'myRow' but gets 'MYROW$$0$$0'. 
> Updated result collects back flattened expression using row constructor.
>   
> UPDATED RESULT:
> {code}
> LogicalProject(MYROW=[ROW(ROW($0))])
>   LogicalProject(MYROW$$0$$0=[1])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testFlattenRecords()
> ORIGIN RESULTS:
> select employees[1] from dept_nested 
> {code}
> LogicalProject(EXPR$0=[$0])
>   LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], 
> EXPR$0$2=[ITEM($3, 1).DETAIL])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> {code}
> COMMENT:
> 1. Given selection of element by index from array, which by definition 
> contain elements of type: 
> {code}
> STRUCT<EMPNO:INT,
>        ENAME:VARCHAR(10),
>        DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10),
>                                          DESC:VARCHAR(20),
>                                        
> OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>>
> {code} 
> First problem is that the intermediate project on top of the scan expected to 
> return only flattened types. 
> But last projection expression returns `DETAIL` field of array element which 
> has type STRUCT and contain array 
> field named 'SKILLS'. 
> 2. The top level project returns incorrect flattened EMPNO:INT field from 
> retrieved array element instead 
> of whole array element of type struct. 
>   
> UPDATED RESULT:
> ELEMENT_TYPE = RecordType(INTEGER EMPNO, VARCHAR(10) ENAME, 
> RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT NULL DESC, 
>                RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT 
> NULL OTHERS) NOT NULL ARRAY SKILLS) DETAIL)
> SKILL_TYPE = RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT 
> NULL DESC, RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT 
> NULL OTHERS) NOT NULL ARRAY SKILLS)
> {code}
> LogicalProject(EXPR$0=[CASE($0, null:ELEMENT_TYPE, ROW($1, $2, CASE($3, 
> null:SKILL_TYPE, ROW($4))))])
>   LogicalProject(IS_NULL_ITEM$6_1=[AS(IS NULL(ITEM($3, 1)), 
> 'IS_NULL_ITEM$6_1')], EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 
> 1).ENAME], IS_NULL_ITEM$6_1DETAIL=[AS(IS NULL(ITEM($3, 1).DETAIL), 
> 'IS_NULL_ITEM$6_1DETAIL')], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CALCITE-3138) Restructuring ROW type fields is not supported

Reply via email to