[ https://issues.apache.org/jira/browse/CALCITE-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279248#comment-15279248 ]
Michael Mior commented on CALCITE-1234: --------------------------------------- Here's a few references you may want to take a look at SOFA: An Extensible Logical Optimizer for UDF-heavy Dataflows http://arxiv.org/abs/1311.6335 This is in the context of map-reduce code, but SOFA makes heavy use of annotations and some of what they're doing may apply in a relational setting. ANNOTATIONS FOR PARALLELIZATION OF USER-DEFINED FUNCTIONS WITH FLEXIBLE PARTITIONING http://www.freepatentsonline.com/y2015/0379076.html This is a patent application and IANAL, so not sure if anything is usable here. I'm also not really sure if Calcite takes advantage of any opportunities for intra-query parallelism. Query Optimization in the Presence of Foreign Functions http://www.vldb.org/conf/1993/P529.PDF This is an old one and maybe a bit too high-level to be useful but talks about query optimization in the context of user-specified rewrite rules for UDFs. > Annotate table functions to allow pushing down project, filter > -------------------------------------------------------------- > > Key: CALCITE-1234 > URL: https://issues.apache.org/jira/browse/CALCITE-1234 > Project: Calcite > Issue Type: Bug > Reporter: Julian Hyde > Assignee: Julian Hyde > > In general it is not possible to push relational operators through table > functions but many table functions have properties that will allow push down: > for instance, they might preserve fields, preserve row count, return rows in > the same order. If we annotate a table function with these properties, we can > automatically push down a Filter, and so forth. > Some ideas: > * {{PreservesFieldNames}}: If an output field has the same name as an input > field, it is assumed to be the same field. > * {{PreservesRows}}: Each input row causes exactly one output row, in the > same order. > * {{FiltersRows}}: Each input row causes at most one output row, in the same > order. > * {{PreservesFieldPositions}}: The leading N columns of the output are > equivalent to the leading N columns of the input. > If {{(PreservesFieldNames or PreservesFieldPositions) and (PreservesRows or > FiltersRows)}}, it is safe to push down a Filter if all of the fields of the > predicate exist in the input. > Similarly pushing down a Project; and we can push down an Aggregate if > {{PreservesRows}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)