GitHub user twalthr opened a pull request:
https://github.com/apache/flink/pull/729
[FLINK-1319][core] Add static code analysis for UDFs
This PR implements a Static Code Analyzer (SCA) that uses the ASM framework
for interpreting Java bytecode of Flink UDFs. The analyzer is build on top of
ASM's `BasicInterpreter`. Instead of ASM's `BasicValue`s, I introduced
`TaggedValue`s which extends `BasicValue` and allows for appending interesting
information to values. Interesting values such as inputs, collectors, or
constants are tagged such that a tracking of atomic input fields through the
entire UDF (until the function returns or calls `collect()`) is possible.
The implementation is as conservative as possible meaning that for cases or
bytecode instructions that haven't been considered the analyzer will fallback
to the ASM library (which removes TaggedValues).
61 JUnit tests are testing the basic functionality. 18 JUnit tests with
code examples from the "real world" are testing the analyzer even more.
The analyzer has 3 modes: DISABLED, OPTIMIZE, HINTS
The interpretation takes some time. It is possible that an analysis of an
UDF takes up to 1 second. Therefore, I didn't enable the analyzer in
TestEnvironment by default to reduce the build times, but if you uncomment the
lines the analyzer supports all 280 UDFs within the entire Flink code.
The analyzer gives hints about:
- Main feature: ForwardedFields semantic properties for all types of
Functions except for MapPartition and Combine
- Warnings if static fields are modified by a Function
- Warnings if a FilterFunction modifies its input objects
- Warnings if a Function returns `null`
- Warnings if a tuple access uses a wrong index
- Information about the number of object creations within a UDF (for manual
optimization)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/twalthr/flink sca
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/729.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #729
----
commit c384fc9740013ec1ae89a2817695078542c47dfe
Author: twalthr <[email protected]>
Date: 2015-05-26T18:22:03Z
[FLINK-1319][core] Add static code analysis for UDFs
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---