Re: Scoping Question

Julian Hyde Fri, 22 Jan 2016 09:16:10 -0800

The scoping rule is very simple - a relational expression can only see its 
input relational expression(s). It can’t see the input of its input, or its 
sibling. So $n is the n’th column of the input.

For instance, the DEPTNO field might be $3 coming out of a Scan of the EMP 
table, would still be $3 after passing through a Filter, and might be $1 after 
passing through a Project.

A Join has two inputs, so you imagine the input columns laid side by side. If 
the left has 3 and the right has 4, then the valid references are $0, $1, $2 
reading from the left input and $3, $4, $5, $6 reading from the right.

You need to know how each operator maps the fields in the input row(s) to an 
output row.
* Most operators (Filter, Sort, Exchange, SemiJoin), output a row that has the 
same fields as the input.
* The set operators (Union, Intersect, Minus) output a row that matches their 
first input (in fact they require that all inputs have the same row type).
* Join outputs the fields from the left input followed by the fields from the 
right input. Some fields become nullable if it is a left, right or full outer 
join. Correlate is similar to Join.
* The leaf operators, Scan and Values, don’t have an input but they just output 
their data.
* Project outputs one column for each expression.
* Aggregate outputs the group keys, followed optionally by the indicators, 
followed by the aggregate functions.

Expressions are also allowed to read correlating variables, called $cor0, $cor1 
etc. which are assigned by a relational expression elsewhere in the graph. But 
they are an advanced topic that don’t crop up in many queries.

Julian

> On Jan 21, 2016, at 7:30 PM, Homer <[email protected]> wrote:
> 
> Hi,
> 
> When using calcite to generate a plan I am trying to understand scope of the 
> $n variables
> 
> I have a simple table
> 
> tab 3 table is a super simple table
> 
> i1 integer
> i2 integer
> t3 text
> t4 text
> 
> if I do
> 
> select i1 as fred from tab3;
> 
> I get
> 
> LogicalProject(fred=[$0])
>  LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> 
> if I look at my catalog layout I see that i1 is the first column in the 
> catalog for this table thus the $0. make sense.
> 
> If I give calcite a kind of silly query for testing and trying to understand 
> the scoping of the result it expects me to execute it gets a little more 
> confusing
> 
> select count(*) from tab3 as alias1 where (select max(alias2.i1) from tab3 
> alias2, tab3 as alias3 where alias2.i1 = alias3.i1) > alias1.i1;
> 
> the execution plan I get back looks like this:
> 
> LogicalAggregate(group=[{}], fred=[COUNT()])
>  LogicalProject($f0=[0])
>    LogicalFilter(condition=[>($5, $0)])
>      LogicalJoin(condition=[true], joinType=[left])
>        LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>        LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
>          LogicalProject(i1=[$0])
>            LogicalFilter(condition=[=($0, $5)])
>              LogicalJoin(condition=[true], joinType=[inner])
>                LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>                LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> 
> My question is about understanding the mapping and scoping of the $n 
> variables in the plan
> 
> The plan uses $0 and $5 in various different scopes here, initially in this 
> part
> 
>          LogicalProject(i1=[$0])
>            LogicalFilter(condition=[=($0, $5)])
>              LogicalJoin(condition=[true], joinType=[inner])
>                LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>                LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> 
> I understand it as the two scans are of the same table, so $0 for alias2.i1 
> $5 for the second table alias3.i1.
> 
> we then move into my shakey ground
> 
> LogicalAggregate(group=[{}], fred=[COUNT()])
>  LogicalProject($f0=[0])
>    LogicalFilter(condition=[>($5, $0)])
>      LogicalJoin(condition=[true], joinType=[left])
>        LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>        LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
> 
> 
> We see another join here referring to $5 and $0 but these are different $5 
> and $0.  In this scope I think $5 is referring to the MAX() projection from 
> below and $0 is referring to alias1.i1
> 
> I am wondering what is the best way to consider the scoping of these $n 
> placeholders.  What is the best way to think of this model?  At what point 
> does $0 and $5 move to the new meaning?
> 
> thanks
> Homer
> 
> 
> 
> 
> 
> 
>

Re: Scoping Question

Reply via email to