[ 
https://issues.apache.org/jira/browse/FLINK-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15917356#comment-15917356
 ] 

ASF GitHub Bot commented on FLINK-3850:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3040#discussion_r105731937
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetJoin.scala
 ---
    @@ -199,10 +200,21 @@ class DataSetJoin(
     
         val joinOpName = s"where: ($joinConditionToString), join: 
($joinSelectionToString)"
     
    +    //consider all fields not which are not keys are forwarded
    +    val leftIndices = (0 until 
left.getRowType.getFieldCount).diff(leftKeys)
    --- End diff --
    
    A Calcite join forwards all fields of both sides. If the left input is 
`(l1, l2, l3)` and the right input is `(r1, r2)`, then the result of the join 
will be `(l1, l2, l3, r1, r2)` for all pairs of left and right that satisfy the 
join condition. It does not matter which of the fields is a key field. If the 
join condition is `l1 == r2`, both fields are forwarded. Since DataSet joins 
organize the input data sets based on the key attributes (partition and sort) 
this attributes are especially interesting for forward field annotations.
    
    Actually, I just noticed that we have to distinguish the type of the join 
(inner, left, right, full). We can only forward the fields of the inner side 
(both for inner join, left for left join, right for right join, none for full 
outer join) because the outer side might have been padded with `null` values. 


> Add forward field annotations to DataSet operators generated by the Table API
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-3850
>                 URL: https://issues.apache.org/jira/browse/FLINK-3850
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: Nikolay Vasilishin
>
> The DataSet API features semantic annotations [1] to hint the optimizer which 
> input fields an operator copies. This information is valuable for the 
> optimizer because it can infer that certain physical properties such as 
> partitioning or sorting are not destroyed by user functions and thus generate 
> more efficient execution plans.
> The Table API is built on top of the DataSet API and generates DataSet 
> programs and code for user-defined functions. Hence, it knows exactly which 
> fields are modified and which not. We should use this information to 
> automatically generate forward field annotations and attach them to the 
> operators. This can help to significantly improve the performance of certain 
> jobs.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/index.html#semantic-annotations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to