[jira] [Updated] (CALCITE-3138) Restructuring ROW type fields is not supported

Igor Guzenko (JIRA) Wed, 14 Aug 2019 02:13:29 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Igor Guzenko updated CALCITE-3138:
----------------------------------
    Description: 

1) RelStructuredTypeFlattener.restructureFields(structType) doesn't support ROW 
type. However, ROW type is flattened by 
RelStructuredTypeFlattener just like struct. So when user queries one column 
with complex type ROW, after flattening and restructuring 
top level project returns reference to first inner primitive field of the 
requested column. 

2) Another finding is related to ITEM expression applied to array of structs 
column. For example, let's imagine a table with column of 
type ARRAY<STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>>. When user requests is SQL 
array_column[1], Calcite generates ITEM($0, 1) , where $0 is ref to array 
column 
from Scan and 1 is index literal. Current flattener generates two field acess 
expressions ITEM($0, 1).a, ITEM($0, 1).b but dont take into account 
that ITEM($0, 1).b returns struct which also should be flattened. 

3) In some cases applying of ITEM after flattenning is no longer possible. For 
example, consider column with type 
STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>. User requests column['b'] in query and 
Calcite creates ITEM($0,'b'). 
After flattenning Scan is covered by LogicalProject($0.a, $0.b.x, $0.b.y) and 
the old projection ITEM($0,'b') can't 
be applied anymore. So now it should be converted to refer only subset of 
fields ($1,$2) from flattening project.

UPDATES IN EXPECTED TEST RESULTS:


----------------------------------------------------------------------------------
TEST CASE: SqlToRelConverterTest.testNestedColumnType()
{code}
select empa.home_address.zip from sales.emp_address empa where 
empa.home_address.city = 'abc'
{code}

OLD RESULT: 

{code}
LogicalProject(ZIP=[$4])
  LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))])
    LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
ZIP7=[$2.ZIP], STATE8=[$2.STATE])
      LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
{code}

1. Above in logical filter, condition references to field $3 which is 
ZIP=[$1.ZIP] field from previous project, 
   however in original query filtering should be done by CITY field. 
2. Also the top level project references to $4 field, which is STATE=[$1.STATE] 
field from project, but original 
   query requested ZIP field.
 
UPDATED RESULT:
{code}
LogicalProject(ZIP=[$3])
  LogicalFilter(condition=[=($2, 'abc')])
    LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
ZIP7=[$2.ZIP], STATE8=[$2.STATE])
      LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
{code}
----------------------------------------------------------------------------------
TEST CASE: SqlToRelConverterTest.testStructTypeAlias()
{code}
select t.r AS myRow from (select row(row(1)) r from dept) t 
{code}

OLD RESULT: 

{code}
LogicalProject(MYROW$$0$$0=[1])
  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

1. Inside the subselect of row(row(1)) type of returned column is 
RecordType(RecordType(INTEGER EXPR$0) EXPR$0),
   but the top level project uses flattened expression and returns to user 
literal 1 with type RecordType(INTEGER MYROW$$0$$0), 
   although the type doesn't match type returned by row(row(1)) expression. 
2. Also it's suspicious that caller expects returned column to have name 
'myRow' but gets 'MYROW$$0$$0'. 
  
UPDATED RESULT:
{code}
LogicalProject(MYROW=[ROW(ROW($0))])
  LogicalProject(MYROW$$0$$0=[1])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

----------------------------------------------------------------------------------

TEST CASE: SqlToRelConverterTest.testFlattenRecords()
{code}
select employees[1] from dept_nested 
{code}

OLD RESULT:

{code}
LogicalProject(EXPR$0=[$0])
  LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], 
EXPR$0$2=[ITEM($3, 1).DETAIL])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
{code}

1. Given selection of element by index from array, which by definition contain 
elements of type: 
{code}
STRUCT<EMPNO:INT,
       ENAME:VARCHAR(10),
       DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10),
                                         DESC:VARCHAR(20),
                                         
OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>>
{code} 

First problem is that the intermediate project on top of the scan expected to 
return only flattened types. 
But last projection expression returns `DETAIL` field of array element which 
has type STRUCT and contain array 
field named 'SKILLS'. 

2. The top level project returns incorrect flattened EMPNO:INT field from 
retrieved array element instead 
of whole array element of type struct. 

  
UPDATED RESULT:

{code}
LogicalProject(EXPR$0=[ROW($0, $1, ROW($2))])
  LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], 
EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
{code}


Other few updates in RelToSqlConverterStructsTest.java and PlannerTest.java are 
more explicit and visible in PR. 



  was:
ROW type is not supported to restructure fields after flattening (see 
RelStructuredTypeFlattener.restructureFields), which may cause wrong result.

Attached patch file can be used to reproduce the issue.

UPDATE: Some tests require changes in expected results. 

----------------------------------------------------------------------------------
TEST CASE: SqlToRelConverterTest.testNestedColumnType()

ORIGIN RESULTS: 

select empa.home_address.zip from sales.emp_address empa where 
empa.home_address.city = 'abc'

{code}
LogicalProject(ZIP=[$4])
  LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))])
    LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
ZIP7=[$2.ZIP], STATE8=[$2.STATE])
      LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
{code}

COMMENT: 
1. Above in logical filter, condition references to field $3 which is 
ZIP=[$1.ZIP] field from previous project, 
   however in original query filtering should be done by CITY field. 
2. Also the top level project references to $4 field, which is STATE=[$1.STATE] 
field from project, but original 
   query requested ZIP field.
 
Updated expected result shows that both problems are fixed now.  

UPDATED RESULT:
{code}
LogicalProject(ZIP=[$3])
  LogicalFilter(condition=[=($2, 'abc')])
    LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
ZIP7=[$2.ZIP], STATE8=[$2.STATE])
      LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
{code}
----------------------------------------------------------------------------------

TEST CASE: SqlToRelConverterTest.testStructTypeAlias()

ORIGIN RESULTS:

select t.r AS myRow from (select row(row(1)) r from dept) t  
{code}
LogicalProject(MYROW$$0$$0=[1])
  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}
COMMENT:
1. Inside the subselect of row(row(1)) type of returned column is 
RecordType(RecordType(INTEGER EXPR$0) EXPR$0),
   but the top level project uses flattened expression and returns to user 
literal 1 with type RecordType(INTEGER MYROW$$0$$0), 
   although the type doesn't match type returned by row(row(1)) expression. 
2. Also it's suspicious that caller expects returned column to have name 
'myRow' but gets 'MYROW$$0$$0'. 

Updated result collects back flattened expression using row constructor.
  
UPDATED RESULT:
{code}
LogicalProject(MYROW=[ROW(ROW($0))])
  LogicalProject(MYROW$$0$$0=[1])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

----------------------------------------------------------------------------------

TEST CASE: SqlToRelConverterTest.testFlattenRecords()

ORIGIN RESULTS:

select employees[1] from dept_nested 
{code}
LogicalProject(EXPR$0=[$0])
  LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], 
EXPR$0$2=[ITEM($3, 1).DETAIL])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
{code}
COMMENT:
1. Given selection of element by index from array, which by definition contain 
elements of type: 
{code}
STRUCT<EMPNO:INT,
       ENAME:VARCHAR(10),
       DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10),
                                         DESC:VARCHAR(20),
                                         
OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>>
{code} 

First problem is that the intermediate project on top of the scan expected to 
return only flattened types. 
But last projection expression returns `DETAIL` field of array element which 
has type STRUCT and contain array 
field named 'SKILLS'. 

2. The top level project returns incorrect flattened EMPNO:INT field from 
retrieved array element instead 
of whole array element of type struct. 

  
UPDATED RESULT:

ELEMENT_TYPE = RecordType(INTEGER EMPNO, VARCHAR(10) ENAME, 
RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT NULL DESC, 
               RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT 
NULL OTHERS) NOT NULL ARRAY SKILLS) DETAIL)
SKILL_TYPE = RecordType(RecordType(VARCHAR(10) NOT NULL TYPE, VARCHAR(20) NOT 
NULL DESC, RecordType(VARCHAR(10) NOT NULL A, VARCHAR(10) NOT NULL B) NOT NULL 
OTHERS) NOT NULL ARRAY SKILLS)
{code}
LogicalProject(EXPR$0=[CASE($0, null:ELEMENT_TYPE, ROW($1, $2, CASE($3, 
null:SKILL_TYPE, ROW($4))))])
  LogicalProject(IS_NULL_ITEM$6_1=[AS(IS NULL(ITEM($3, 1)), 
'IS_NULL_ITEM$6_1')], EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 
1).ENAME], IS_NULL_ITEM$6_1DETAIL=[AS(IS NULL(ITEM($3, 1).DETAIL), 
'IS_NULL_ITEM$6_1DETAIL')], EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
{code}
 


> Restructuring ROW type fields is not supported
> ----------------------------------------------
>
>                 Key: CALCITE-3138
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3138
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Haisheng Yuan
>            Assignee: Igor Guzenko
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: ROW_repro.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1) RelStructuredTypeFlattener.restructureFields(structType) doesn't support 
> ROW type. However, ROW type is flattened by 
> RelStructuredTypeFlattener just like struct. So when user queries one column 
> with complex type ROW, after flattening and restructuring 
> top level project returns reference to first inner primitive field of the 
> requested column. 
> 2) Another finding is related to ITEM expression applied to array of structs 
> column. For example, let's imagine a table with column of 
> type ARRAY<STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>>. When user requests is SQL 
> array_column[1], Calcite generates ITEM($0, 1) , where $0 is ref to array 
> column 
> from Scan and 1 is index literal. Current flattener generates two field acess 
> expressions ITEM($0, 1).a, ITEM($0, 1).b but dont take into account 
> that ITEM($0, 1).b returns struct which also should be flattened. 
> 3) In some cases applying of ITEM after flattenning is no longer possible. 
> For example, consider column with type 
> STRUCT<a:INT, b:STRUCT<x:INT,y:INT>>. User requests column['b'] in query and 
> Calcite creates ITEM($0,'b'). 
> After flattenning Scan is covered by LogicalProject($0.a, $0.b.x, $0.b.y) and 
> the old projection ITEM($0,'b') can't 
> be applied anymore. So now it should be converted to refer only subset of 
> fields ($1,$2) from flattening project.
> UPDATES IN EXPECTED TEST RESULTS:
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testNestedColumnType()
> {code}
> select empa.home_address.zip from sales.emp_address empa where 
> empa.home_address.city = 'abc'
> {code}
> OLD RESULT: 
> {code}
> LogicalProject(ZIP=[$4])
>   LogicalFilter(condition=[=($3, 'abc':VARCHAR(20))])
>     LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
> ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
> ZIP7=[$2.ZIP], STATE8=[$2.STATE])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> {code}
> 1. Above in logical filter, condition references to field $3 which is 
> ZIP=[$1.ZIP] field from previous project, 
>    however in original query filtering should be done by CITY field. 
> 2. Also the top level project references to $4 field, which is 
> STATE=[$1.STATE] field from project, but original 
>    query requested ZIP field.
>  
> UPDATED RESULT:
> {code}
> LogicalProject(ZIP=[$3])
>   LogicalFilter(condition=[=($2, 'abc')])
>     LogicalProject(EMPNO=[$0], STREET=[$1.STREET], CITY=[$1.CITY], 
> ZIP=[$1.ZIP], STATE=[$1.STATE], STREET5=[$2.STREET], CITY6=[$2.CITY], 
> ZIP7=[$2.ZIP], STATE8=[$2.STATE])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> {code}
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testStructTypeAlias()
> {code}
> select t.r AS myRow from (select row(row(1)) r from dept) t 
> {code}
> OLD RESULT: 
> {code}
> LogicalProject(MYROW$$0$$0=[1])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> 1. Inside the subselect of row(row(1)) type of returned column is 
> RecordType(RecordType(INTEGER EXPR$0) EXPR$0),
>    but the top level project uses flattened expression and returns to user 
> literal 1 with type RecordType(INTEGER MYROW$$0$$0), 
>    although the type doesn't match type returned by row(row(1)) expression. 
> 2. Also it's suspicious that caller expects returned column to have name 
> 'myRow' but gets 'MYROW$$0$$0'. 
>   
> UPDATED RESULT:
> {code}
> LogicalProject(MYROW=[ROW(ROW($0))])
>   LogicalProject(MYROW$$0$$0=[1])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> ----------------------------------------------------------------------------------
> TEST CASE: SqlToRelConverterTest.testFlattenRecords()
> {code}
> select employees[1] from dept_nested 
> {code}
> OLD RESULT:
> {code}
> LogicalProject(EXPR$0=[$0])
>   LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], 
> EXPR$0$2=[ITEM($3, 1).DETAIL])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> {code}
> 1. Given selection of element by index from array, which by definition 
> contain elements of type: 
> {code}
> STRUCT<EMPNO:INT,
>        ENAME:VARCHAR(10),
>        DETAIL:STRUCT<SKILLS:ARRAY<STRUCT<TYPE:VARCHAR(10),
>                                          DESC:VARCHAR(20),
>                                        
> OTHERS:STRUCT<A:VARCHAR(10),B:VARCHAR(10)>>>>>
> {code} 
> First problem is that the intermediate project on top of the scan expected to 
> return only flattened types. 
> But last projection expression returns `DETAIL` field of array element which 
> has type STRUCT and contain array 
> field named 'SKILLS'. 
> 2. The top level project returns incorrect flattened EMPNO:INT field from 
> retrieved array element instead 
> of whole array element of type struct. 
>   
> UPDATED RESULT:
> {code}
> LogicalProject(EXPR$0=[ROW($0, $1, ROW($2))])
>   LogicalProject(EXPR$0$0=[ITEM($3, 1).EMPNO], EXPR$0$1=[ITEM($3, 1).ENAME], 
> EXPR$0$2$0=[ITEM($3, 1).DETAIL.SKILLS])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> {code}
> Other few updates in RelToSqlConverterStructsTest.java and PlannerTest.java 
> are more explicit and visible in PR. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (CALCITE-3138) Restructuring ROW type fields is not supported

Reply via email to