GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/729

    [FLINK-1319][core] Add static code analysis for UDFs

    This PR implements a Static Code Analyzer (SCA) that uses the ASM framework 
for interpreting Java bytecode of Flink UDFs. The analyzer is build on top of 
ASM's `BasicInterpreter`. Instead of ASM's `BasicValue`s, I introduced 
`TaggedValue`s which extends `BasicValue` and allows for appending interesting 
information to values. Interesting values such as inputs, collectors, or 
constants are tagged such that a tracking of atomic input fields through the 
entire UDF (until the function returns or calls `collect()`) is possible.
    
    The implementation is as conservative as possible meaning that for cases or 
bytecode instructions that haven't been considered the analyzer will fallback 
to the ASM library (which removes TaggedValues).
    
    61 JUnit tests are testing the basic functionality. 18 JUnit tests with 
code examples from the "real world" are testing the analyzer even more.
    
    The analyzer has 3 modes: DISABLED, OPTIMIZE, HINTS
    
    The interpretation takes some time. It is possible that an analysis of an 
UDF takes up to 1 second. Therefore, I didn't enable the analyzer in 
TestEnvironment by default to reduce the build times, but if you uncomment the 
lines the analyzer supports all 280 UDFs within the entire Flink code. 
    
    The analyzer gives hints about:
    - Main feature: ForwardedFields semantic properties for all types of 
Functions except for MapPartition and Combine
    - Warnings if static fields are modified by a Function
    - Warnings if a FilterFunction modifies its input objects
    - Warnings if a Function returns `null`
    - Warnings if a tuple access uses a wrong index
    - Information about the number of object creations within a UDF (for manual 
optimization)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink sca

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #729
    
----
commit c384fc9740013ec1ae89a2817695078542c47dfe
Author: twalthr <[email protected]>
Date:   2015-05-26T18:22:03Z

    [FLINK-1319][core] Add static code analysis for UDFs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to