[ 
https://issues.apache.org/jira/browse/FLINK-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902058#comment-15902058
 ] 

ASF GitHub Bot commented on FLINK-3850:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3040#discussion_r104723484
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetCorrelate.scala
 ---
    @@ -97,18 +103,41 @@ class DataSetCorrelate(
         val sqlFunction = rexCall.getOperator.asInstanceOf[TableSqlFunction]
         val pojoFieldMapping = sqlFunction.getPojoFieldMapping
         val udtfTypeInfo = 
sqlFunction.getRowTypeInfo.asInstanceOf[TypeInformation[Any]]
    +    val returnType = FlinkTypeFactory.toInternalRowTypeInfo(getRowType)
     
         val mapFunc = correlateMapFunction(
           config,
           inputDS.getType,
           udtfTypeInfo,
    +      returnType,
           getRowType,
           joinType,
           rexCall,
           condition,
           Some(pojoFieldMapping),
           ruleDescription)
     
    -    inputDS.flatMap(mapFunc).name(correlateOpName(rexCall, sqlFunction, 
relRowType))
    +    def getIndices = {
    --- End diff --
    
    A correlate forwards all fields from the input and the table function like 
this `[in1, in2, in3, tf1, tf2]` for an input `[in1, in2, in3]` and table 
function `[tf1, tf2]`. So we can do a simple position based mapping of the 
fields of the input type against the output type (field names might change). 
Basically similar to what you are doing with the single row join.
    
    We do not need to look at the table function or the condition.


> Add forward field annotations to DataSet operators generated by the Table API
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-3850
>                 URL: https://issues.apache.org/jira/browse/FLINK-3850
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: Nikolay Vasilishin
>
> The DataSet API features semantic annotations [1] to hint the optimizer which 
> input fields an operator copies. This information is valuable for the 
> optimizer because it can infer that certain physical properties such as 
> partitioning or sorting are not destroyed by user functions and thus generate 
> more efficient execution plans.
> The Table API is built on top of the DataSet API and generates DataSet 
> programs and code for user-defined functions. Hence, it knows exactly which 
> fields are modified and which not. We should use this information to 
> automatically generate forward field annotations and attach them to the 
> operators. This can help to significantly improve the performance of certain 
> jobs.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/index.html#semantic-annotations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to