Re: Expression/LogicalPlan dichotomy in Spark SQL Catalyst

Michael Armbrust Mon, 21 Dec 2015 15:53:20 -0800

>
> Why was the choice made in Catalyst to make LogicalPlan/QueryPlan and
> Expression separate subclasses of TreeNode, instead of e.g. also make
> QueryPlan inherit from Expression?
>
I think this is a pretty common way to model things (glancing at postgres
it looks similar).  Expression and plans are pretty different concepts.  An
expression can be evaluated on a single input row and returns a single
value.  In contrast a query plan operates on a relation and has a schema
with many different atomic values.



> The code also contains duplicate functionality, like
> LeafNode/LeafExpression, UnaryNode/UnaryExpression and
> BinaryNode/BinaryExpression.


These traits actually have different semantics for expressions vs. plans
(i.e. a UnaryExpression nullability is based on its child's nullability,
whereas this would not make sense for a UnaryNode which does not have a
concept of nullability).


> this makes whole-tree transformations really cumbersome since we've got to
> deal with 'pivot points' for these 2 types of TreeNodes, where a recursive
> transformation can only be done on 1 specific type of children, and then
> has to be dealt with again within the same PartialFunction for the other
> type in which the matching case(s) can be nested.


It is not clear to me that you actually want these transformations to
happen seamlessly.  For example, the resolution rules for subqueries are
different than normal plans because you have to reason about correlation.
That said, it seems like you should be able to do some magic in
RuleExecutor to make sure that things like the optimizer descend seamlessly
into nested query plans.

Re: Expression/LogicalPlan dichotomy in Spark SQL Catalyst

Reply via email to