[jira] [Commented] (CALCITE-1208) Improve two-level column structure handling

Julian Hyde (JIRA) Thu, 28 Jul 2016 22:55:39 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398757#comment-15398757
 ]


Julian Hyde commented on CALCITE-1208:
--------------------------------------

Responding to [~jni]'s review comments:

bq. Within one struct, do we allow multiple fields to have PEEK_FIELD_DEFAULT? 
Seems the current code does not enforce the requirement of only one field with 
PEEK_FIELD_DEFAULT when fields are added to struct.

In Phoenix there would be only one field PEEK_FIELD_DEFAULT (i.e. the default 
column group). But in principle there could be more. If two record fields in 
table t were labeled PEEK_FIELD_DEFAULT, and each had a column called c, then 
t.c would be ambiguous because both have equal rank.

bq. this query is same as the one on line 7823?

Oops, yes; fixed.

bq. Probably we had better add StructType (peek_fields) to this comment. 
Otherwise, people have to go to MockCatalogReader to see each record type's 
struct-type to understand these test case.

Agreed.

bq. I understand the tests cover second-level columns and phenix may only need 
two-level columns. But looking at the definition of RelRecordType, the code 
will allow any-level columns, right? Is this desirable?

You are correct that Phoenix has a maximum depth of 2, but it would work at 
greater depths, and it would be useful. Suppose you have a table with an XML 
(or JSON) schema that (unlike Drill) you know when validating the query. In the 
Orders table, you could allow customer.address.zipcode to be abbreviated as 
zipcode.

{noformat}
 T1 has
    F0 - peek_fields_default
         C0
         C1
         F0  -- peek_fields
             C0
             C1
{noformat}

bq. {{Select F0.C0 from T1}} ; Will it resolve to C0 at second level, or third 
level?

If there are two matches, we choose the shorter one. It will resolve to C0 at 
the second level, i.e. F0.C0.

{noformat}
 T1 has
    F0 - peek_fields_default
         C0
         C1
         F0  -- peek_fields_default
             C0
             C1
{noformat}

bq. {{Select C0 from t1}}; Will this resolve to second level C0, or third level 
C0?

Again, the shorter path, so F0.C0, the second level.

> Improve two-level column structure handling
> -------------------------------------------
>
>                 Key: CALCITE-1208
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1208
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.7.0
>            Reporter: Maryann Xue
>            Assignee: Julian Hyde
>              Labels: phoenix
>             Fix For: 1.9.0
>
>
> Calcite now has support for nested column structure in parsing and 
> validation, by representing the inner-level columns as a RexFieldAccess based 
> on a RexInputRef. Meanwhile it does not flatten the inner level structure in 
> wildcard expansion, which would then cause an UnsupportedOperationException 
> in Avatica.
>  
> The idea is to take into account this nested structure in column resolving, 
> but to flatten the structure when translating to RelNode/RexNode.
> For example, if the table structure is defined as
> {code}VARCHAR K0,
> VARCHAR C1,
> RecordType(INTEGER C0, INTEGER C1) F0,
> RecordType(INTEGER C0, INTEGER C2) F1{code}
> , it should be viewed as a flat type like
> {code}VARCHAR K0,
> VARCHAR C1,
> INTEGER F0.C0,
> INTEGER F0.C1,
> INTEGER F1.C0,
> INTEGER F1.C2{code}
> , so that:
> 1) Column reference "K0" is translated as {{$0}}
> 2) Column reference "F0.C1" is translated as {{$3}}
> 3) Wildcard "*" is translated as: {{$0, $1, $2, $3, $4, $5}}
> 4) Complex-column wildcard "F1.*", which is translated as {{$2, $3}}
> And we would like to resolve columns based on the following rules (here we 
> only consider the "suffix" part of the qualified names, which means the table 
> resolving is already done by this time):
> a) A two-part column name is matched with its first-level column name and its 
> second-level column name. For example, "F1.C0" corresponds to $4; "F1,X" will 
> throw a column not found error.
> b) A single-part column name is matched against non-nested columns first, and 
> if no matches, it is then matched against those second-level column names. 
> For example, "C1" will be matched as "$1" instead of "$3", since non-nested 
> columns have a higher priority; "C2" will be matched as "$5"; "C0" will lead 
> to an ambiguous column error, since it exists under both "F0" and "F1".
> c) We would also like to have a way for defining "default first-level column" 
> so that it has a precedence in column resolving over other first-level 
> columns. For example, if "F0" is defined as default, "C0" will not cause an 
> ambiguous column error, but instead be matched as "$2".
> d) Reference to first-level column only without wildcard is not allowed, 
> e.g., "F1".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CALCITE-1208) Improve two-level column structure handling

Reply via email to