[jira] [Commented] (CALCITE-481) Add "Spool" operator, to allow re-use of relational expressions

JIRA Wed, 26 Nov 2014 03:50:09 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226075#comment-14226075
 ]


Jesús Camacho Rodríguez commented on CALCITE-481:
-------------------------------------------------

Thanks for opening this issue [~julianhyde]. I think a _spool_ operator would 
be an important addition to Calcite.

A couple of pointers on how the _spool_ operator can be used to accelerate 
query execution: 
[here|http://www.dbis.informatik.hu-berlin.de/fileadmin/lectures/SS2008/Seminar_MatViews/p533-zhou.pdf]
 and 
[here|http://research.microsoft.com/en-us/um/people/jrzhou/pub/scope-vldbj.pdf].
 I also found [this blog 
post|http://sqlblog.com/blogs/rob_farley/archive/2013/06/11/spooling-in-sql-execution-plans.aspx]
 talking about the integration of the _spool_ operator within MS SQL Server.

I think those links give a neat idea of how the _spool_ operator could be 
implemented -both logically and physically- to bring benefits to query 
execution.

One aspect that we could discuss is whether we need to have two versions of the 
operator at the logical level as they do (_eager_ and _lazy_) or a single one. 
IMO, eager or lazy seems a physical aspect, so probably a single version of the 
operator would be enough. What do you think?

> Add "Spool" operator, to allow re-use of relational expressions
> ---------------------------------------------------------------
>
>                 Key: CALCITE-481
>                 URL: https://issues.apache.org/jira/browse/CALCITE-481
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> If a sub-tree occurs more than once in a query an efficient plan would 
> probably evaluate once and have two readers read the same data. We propose a 
> "Spool" relational expression for this purpose.
> Spool would have one input, the expression that populates it.
> In the VolcanoPlanner, any RelNode can already have multiple consumers (each 
> of which sees the same row type and the same data) but an optimal plan does 
> not typically include multiple uses of the same node, so most implementors 
> (e.g. EnumerableRelImplementor) would just not notice, and generate the same 
> code twice. Having an explicit Spool would alert the implementor to re-use 
> the result.
> We do not prescribe a mechanism for implementing Spool as a physical 
> operator. A job that populates a temporary table is one possible mechanism.
> As part of this case, we should implement Spool in Enumerable convention, and 
> use it to evaluate some test queries.
> The other reason to implement Spool is costing. The cost of a Spool with N 
> consumers is typically something like A + B . N. A, the fixed cost, is 
> significantly larger than B, the re-play cost.
> Volcano's dynamic programming model does not make it easy to account for 
> re-use. There are approaches in academia based on integer linear programming; 
> see e.g. http://www.slideshare.net/INRIA-OAK/plreuse 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CALCITE-481) Add "Spool" operator, to allow re-use of relational expressions

Reply via email to