[ 
https://issues.apache.org/jira/browse/CALCITE-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279248#comment-15279248
 ] 

Michael Mior commented on CALCITE-1234:
---------------------------------------

Here's a few references you may want to take a look at

SOFA: An Extensible Logical Optimizer for UDF-heavy Dataflows
http://arxiv.org/abs/1311.6335
This is in the context of map-reduce code, but SOFA makes heavy use of 
annotations and some of what they're doing may apply in a relational setting.

ANNOTATIONS FOR PARALLELIZATION OF USER-DEFINED FUNCTIONS WITH FLEXIBLE 
PARTITIONING
http://www.freepatentsonline.com/y2015/0379076.html
This is a patent application and IANAL, so not sure if anything is usable here. 
I'm also not really sure if Calcite takes advantage of any opportunities for 
intra-query parallelism.

Query Optimization in the Presence of Foreign Functions 
http://www.vldb.org/conf/1993/P529.PDF
This is an old one and maybe a bit too high-level to be useful but talks about 
query optimization in the context of user-specified rewrite rules for UDFs.

> Annotate table functions to allow pushing down project, filter
> --------------------------------------------------------------
>
>                 Key: CALCITE-1234
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1234
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> In general it is not possible to push relational operators through table 
> functions but many table functions have properties that will allow push down: 
> for instance, they might preserve fields, preserve row count, return rows in 
> the same order. If we annotate a table function with these properties, we can 
> automatically push down a Filter, and so forth.
> Some ideas:
> * {{PreservesFieldNames}}: If an output field has the same name as an input 
> field, it is assumed to be the same field.
> * {{PreservesRows}}: Each input row causes exactly one output row, in the 
> same order.
> * {{FiltersRows}}: Each input row causes at most one output row, in the same 
> order.
> * {{PreservesFieldPositions}}: The leading N columns of the output are 
> equivalent to the leading N columns of the input.
> If {{(PreservesFieldNames or PreservesFieldPositions) and (PreservesRows or 
> FiltersRows)}}, it is safe to push down a Filter if all of the fields of the 
> predicate exist in the input.
> Similarly pushing down a Project; and we can push down an Aggregate if 
> {{PreservesRows}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to