[
https://issues.apache.org/jira/browse/FLINK-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772984#comment-15772984
]
ASF GitHub Bot commented on FLINK-3850:
---------------------------------------
GitHub user NickolayVasilishin opened a pull request:
https://github.com/apache/flink/pull/3040
[FLINK-3850] Add forward field annotations to DataSet
Add forward field annotations to DataSet operators generated by the Table
API
- Added field forwarding at most of `DataSetRel` implementations.
- String with forwarded fields allowed to be empty at
`SemanticPropUtil.java`
- Wrapper for indices based on types moved to object class
`FieldForwardingUtils`
- In most cases forwarding done only for conversion
`BatchScan`: forwarding at conversion
`DataSetAggregate`: forwarding at conversion
`DataSetCalc`: forwarding based on unmodified at RexCalls operands
`DataSetCorrelate`: forwarding based on unmodified at RexCalls operands
`DataSetIntersect`: forwarding at conversion
`DataSetJoin`: forwarding based on fields which are not keys
`DataSetMinus`: forwarding at conversion
`DataSetSingleRowJoin`: forwarded all fields from multi row dataset,
single row used via broadcast
`DataSetSort`: all fields forwarded + conversion
I hope, I've understood the meaning of forward fields right: fields, that
are not used for computations. So I assumed, that these fields are not used in
`RexCalls` or as `join keys`. Also I forwarded fields in type conversions.
The most complex thing was to determine correct input and output field
names.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NickolayVasilishin/flink FLINK-3850
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3040.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3040
----
commit 25cc1f022eb399bade37ef7b0fd0b87a9e509d67
Author: nikolay_vasilishin <[email protected]>
Date: 2016-12-23T10:50:46Z
[FLINK-3850] Add forward field annotations to DataSet operators generated
by the Table API
- Added field forwarding at most of DataSetRel implementations.
- String with forwarded fields allowed to be empty at
SemanticPropUtil.java
- Wrapper for indices based on types moved to object class
FieldForwardingUtils
- In most cases forwarding done only for conversion
BatchScan: forwarding at conversion
DataSetAggregate: forwarding at conversion
DataSetCalc: forwarding based on unmodified at RexCalls operands
DataSetCorrelate: forwarding based on unmodified at RexCalls operands
DataSetIntersect: forwarding at conversion
DataSetJoin: forwarding based on fields which are not keys
DataSetMinus: forwarding at conversion
DataSetSingleRowJoin: forwarded all fields from multi row dataset,
single row used via broadcast
DataSetSort: all fields forwarded + conversion
Conflicts:
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/BatchScan.scala
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetAggregate.scala
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetCalc.scala
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetCorrelate.scala
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetIntersect.scala
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetJoin.scala
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetMinus.scala
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSingleRowJoin.scala
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetSort.scala
----
> Add forward field annotations to DataSet operators generated by the Table API
> -----------------------------------------------------------------------------
>
> Key: FLINK-3850
> URL: https://issues.apache.org/jira/browse/FLINK-3850
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Reporter: Fabian Hueske
> Assignee: Nikolay Vasilishin
>
> The DataSet API features semantic annotations [1] to hint the optimizer which
> input fields an operator copies. This information is valuable for the
> optimizer because it can infer that certain physical properties such as
> partitioning or sorting are not destroyed by user functions and thus generate
> more efficient execution plans.
> The Table API is built on top of the DataSet API and generates DataSet
> programs and code for user-defined functions. Hence, it knows exactly which
> fields are modified and which not. We should use this information to
> automatically generate forward field annotations and attach them to the
> operators. This can help to significantly improve the performance of certain
> jobs.
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/index.html#semantic-annotations
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)