[
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562512#comment-14562512
]
ASF GitHub Bot commented on FLINK-1319:
---------------------------------------
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/729#issuecomment-106232545
I second Ufuk's comments.
Merging it and deactivating it by default. I can see a 0.9.1 or 0.10.0
release coming in very soon afterwards, because we have a big set of issues
still in the pipeline.
Initially activating hinting in the local environment (what people use
during debigging anyways) and having it deactivated in the "production"
environments (remote and context).
Other comments:
- How about printing the hints to sysout? I can see them getting lost
among the logging statements. Also, people often have logging not activated in
the IDE.
- Package based exclusions never worked, it was always an issue with the
quickstarts. I assume you want the exclusion to make sure you do not analyze
the built-in default join function, for example? What you can do is add an
annotation that says "DoNotAnalyze" to that functions, and then simply analyze
everything.
> Add static code analysis for UDFs
> ---------------------------------
>
> Key: FLINK-1319
> URL: https://issues.apache.org/jira/browse/FLINK-1319
> Project: Flink
> Issue Type: New Feature
> Components: Java API, Scala API
> Reporter: Stephan Ewen
> Assignee: Timo Walther
> Priority: Minor
>
> Flink's Optimizer takes information that tells it for UDFs which fields of
> the input elements are accessed, modified, or frwarded/copied. This
> information frequently helps to reuse partitionings, sorts, etc. It may speed
> up programs significantly, as it can frequently eliminate sorts and shuffles,
> which are costly.
> Right now, users can add lightweight annotations to UDFs to provide this
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> We worked with static code analysis of UDFs before, to determine this
> information automatically. This is an incredible feature, as it "magically"
> makes programs faster.
> For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this
> works surprisingly well in many cases. We used the "Soot" toolkit for the
> static code analysis. Unfortunately, Soot is LGPL licensed and thus we did
> not include any of the code so far.
> I propose to add this functionality to Flink, in the form of a drop-in
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could
> simply download a special "flink-code-analysis.jar" and drop it into the
> "lib" folder to enable this functionality. We may even add a script to
> "tools" that downloads that library automatically into the lib folder. This
> should be legally fine, since we do not redistribute LGPL code and only
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the
> patentability, if I remember correctly).
> Prior work on this has been done by [~aljoscha] and [~skunert], which could
> provide a code base to start with.
> *Appendix*
> Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
> Papers on static analysis and for optimization:
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and
> http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> Quick introduction to the Optimizer:
> http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf
> (Section 6)
> Optimizer for Iterations:
> http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf
> (Sections 4.3 and 5.3)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)