Re: Null checking in Beam

Kyle Weaver Tue, 23 Mar 2021 17:25:19 -0700

I moved the checker framework from opt-out to opt-in [1]. You can opt in by
passing "-PenableCheckerFramework=true" to the Gradle invocation. The
checker still runs on all Jenkins CI builds.


[1] https://github.com/apache/beam/pull/14301

On Wed, Mar 17, 2021 at 8:50 AM Kenneth Knowles <k...@apache.org> wrote:

>
>
> On Wed, Mar 17, 2021 at 2:49 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> If there is no way to configure which annotations should be generated,
>> then I'd be +1 for removing the checker to separated CI and adding an
>> opt-in flag for the check when run locally.
>>
> Yes. If this answer is correct, then we are out of luck:
> https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods
>
> A good followup to moving it to a separate CI job would be to attach a
> full type checking run to the `:check` gradle task. This should be the best
> place to attach all of our extended checks like spotbugs, spotless,
> checkstyle, checkerframework. I rarely run `:check` myself (I think it is
> currently broken in some ways) but it may be good to start.
>
> BTW the slowdown we need to solve is local builds, not CI runs. When it
> was added the slowdown was almost completely prevented by caching. And now
> that we disable it for test classes (where for some reason it was extra
> slow) I wonder if it will speed up the main CI runs at all. So I would say
> the first thing to do is to just disable it by default but enable it in the
> Jenkins job builders.
>
> Kenn
>
> We need to solve the issue for dev@ as well, as the undesirable
>> annotations are already digging their way to the code base:
>>
>> git/apache/beam$ git grep UnknownKeyFor
>>
>> Another strange thing is that it seems, that we are pulling the
>> checkerframework as a runtime dependency (at least for some submodules).
>> When I run `mvn dependency:tree` on one of my projects that uses maven I see
>>
>> [INFO] +- com.google.guava:guava:jar:30.1-jre:provided
>> [INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
>> [INFO] |  +-
>> com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided
>> [INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided
>>
>> which is fine, but when I add beam-sdks-java-extensions-euphoria it
>> changes to
>>
>> [INFO] +-
>> org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
>> [INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile
>>
>> I'm not a gradle guru, so I cannot tell how this happens, there seems to
>> be nothing special about euphoria in the gradle.
>>
>>  Jan
>> On 3/16/21 7:12 PM, Kenneth Knowles wrote:
>>
>> I've asked on checkerframework users mailing list if they or any users
>> have recommendations for the IntelliJ integration issue.
>>
>> It is worth noting that the default annotations inserted into the
>> bytecode do add value: the @NonNull type annotations are default for
>> checkerframework but not default for spotbugs. So having the default
>> inserted enables downstream users to have betters spotbugs heuristic
>> analysis. Further, the defaults can be adjusted by users so freezing them
>> at the values we use them at is valuable in general across all tools.
>>
>> I think it makes sense to sacrifice these minor value-adds to keep things
>> simple-looking for IntelliJ users.
>>
>> Kenn
>>
>> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <k...@apache.org> wrote:
>>
>>> Seems it is an FAQ with no solution:
>>> https://checkerframework.org/manual/#faq-classfile-annotations
>>>
>>> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <k...@apache.org>
>>> wrote:
>>>
>>>> Adding -PskipCheckerframework when releasing will solve it for users,
>>>> but not for dev@.
>>>>
>>>> Making it off by default and a separate CI check that turns it on would
>>>> solve it overall but has the downsides mentioned before.
>>>>
>>>> It would be very nice to simply have a flag to not insert default
>>>> annotations.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>
>>>>> I believe it is not a problem of Idea. At least I didn't notice
>>>>> behavior like that with Guava, although Guava uses the framework as well.
>>>>> Maybe there is a way to tune which annotations should be generated? Or
>>>>> Guava handles that somehow differently?
>>>>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>>>>
>>>>> I've also been annoyed at IntelliJ autogenenerating all these
>>>>> annotations. I believe Kenn said that this was not the intention - maybe
>>>>> there's an IntelliJ setting that would stop this from happening?
>>>>>
>>>>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>>>
>>>>>> I don't know the details of the checkerframework, but there seems a
>>>>>> contradiction between what code is (currently) generated and what we
>>>>>> therefore release and what actually the checkerframework states [1]:
>>>>>>
>>>>>> @UnknownKeyFor:
>>>>>>
>>>>>> Used internally by the type system; should never be written by a
>>>>>> programmer.
>>>>>>
>>>>>> If this annotation is generated for overwritten methods, then I'd
>>>>>> say, that it means we place a great burden to our users - either not 
>>>>>> using
>>>>>> autogenerated methods, or erase all the generated annotations afterwards.
>>>>>> Either way, that is not "polite" from Beam.
>>>>>>
>>>>>> What we should judge is not only a formal purity of code, but what
>>>>>> stands on the other side is how usable APIs we provide. We should not 
>>>>>> head
>>>>>> for 100% pure code and sacrificing use comfort. Every check that leaks to
>>>>>> user code is at a price and we should not ignore that. No free lunch.
>>>>>>
>>>>>> From my point of view:
>>>>>>
>>>>>>  a) if a check does not modify the bytecode, it is fine and we can
>>>>>> use it - we are absolutely free to use any tooling we agree on, if our
>>>>>> users cannot be affected anyhow
>>>>>>
>>>>>>  b) if a tool needs to be leaked to user, it should be as small
>>>>>> leakage as possible
>>>>>>
>>>>>>  c) if a check significantly affects compile performance, it should
>>>>>> be possible to opt-out
>>>>>>
>>>>>> I think that our current setup violates all these three points.
>>>>>>
>>>>>> Moving the check to different CI is a possibility (a)), it would then
>>>>>> require opt-in flag to be able to run the check locally. It would also 
>>>>>> stop
>>>>>> the leakage (if we would release code without this check).
>>>>>>
>>>>>> If we want to keep some annotations for user's benefit (which might
>>>>>> be fine), it should be really limited to the bare minimum (e.g. only
>>>>>> @Nullable for method arguments and return values, possibly more, I don't
>>>>>> know if and how we can configure that). Definitely not @UnknownKeyFor, 
>>>>>> that
>>>>>> is simply internal to the checker. We should then have opt-out flag for
>>>>>> local development before committing changes.
>>>>>>
>>>>>>  Jan
>>>>>>
>>>>>> [1]
>>>>>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>>>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <k...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I will be blunt about my opinions about the general issue:
>>>>>>>>
>>>>>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>>>>>    * They have been since 2003 at the latest [1] (this is when the
>>>>>>>> types were hacked into Java - the foundation dates back to the 70s or
>>>>>>>> earlier)
>>>>>>>>
>>>>>>>
>>>>>>> Huh - Fahndrich tried to hire me once to work on a project called
>>>>>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in 
>>>>>>> the
>>>>>>> citations!
>>>>>>>
>>>>>>
>>>>>> Umm, I was confusing names. Fahndrich is actually a former coworker
>>>>>> at Google :)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>    * Checkerframework is a _pluggable_ system where that nullness
>>>>>>>> type system is a "hello, world" level demo, since 2008 at the latest 
>>>>>>>> [2].
>>>>>>>>    * Our users should know this and judge us accordingly.
>>>>>>>>
>>>>>>>> - Checkerframework should be thought of and described as type
>>>>>>>> checking, because it is. It is not heuristic nor approximate.
>>>>>>>> - If your code was unclear about whether something could be null,
>>>>>>>> it was probably unclear to a person reading it also, and very likely to
>>>>>>>> have actual bugs.
>>>>>>>> - APIs that accept a lot of nullable parameters are, generally
>>>>>>>> speaking, bad APIs. They are hard to use correctly, less readable, and 
>>>>>>>> very
>>>>>>>> likely to cause actual bugs. You are forcing your users to deal with
>>>>>>>> accidental complexity you left behind.
>>>>>>>>   * Corollary to the above two points: Almost all the time, the
>>>>>>>> changes to clearify nullness make your code better, more readable, 
>>>>>>>> easier
>>>>>>>> for users or editors.
>>>>>>>> - It is true that there is a learning curve to programming in this
>>>>>>>> way.
>>>>>>>>
>>>>>>>
>>>>>>> I agree with the above in a closed system. However as mentioned,
>>>>>>> some of the APIs we use suffer from this.
>>>>>>>
>>>>>>> In a previous life, I saw up close an effort to add such analysis to
>>>>>>> a large codebase. Two separate tools called Prefix and Prefast were used
>>>>>>> (the difference between the two is actually quite interesting, but not
>>>>>>> relevant here). However in order to make this analysis useful, there 
>>>>>>> was a
>>>>>>> massive effort to properly annotate the entire codebase, including all
>>>>>>> libraries used. This isn't a perfect example - this was a C++ codebase
>>>>>>> which is much harder to analyze, and these tools identified far more 
>>>>>>> than
>>>>>>> simply nullness errors (resource leaks, array indices, proper string 
>>>>>>> null
>>>>>>> termination, exception behavior, etc.). However the closer we can get to
>>>>>>> properly annotating the transitive closure of our dependencies, the 
>>>>>>> better
>>>>>>> this framework will work.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> - There are certainly common patterns in Java that do not work very
>>>>>>>> well, and suppression is sometimes the best option.
>>>>>>>>    * Example: JUnit's @Setup and @Test conventions do not work very
>>>>>>>> well and it is not worth the effort to make them work. This is actually
>>>>>>>> because if it were "normal code" it would be bad code. There are 
>>>>>>>> complex
>>>>>>>> inter-method dependencies enforced only by convention. This matters:
>>>>>>>> sometimes a JUnit test class is called from another class, used as 
>>>>>>>> "normal
>>>>>>>> code". This does go wrong in practice. Plain old JUnit test cases
>>>>>>>> frequently go wrong as well.
>>>>>>>>
>>>>>>>> And here is my opinion when it comes to Beam:
>>>>>>>>
>>>>>>>> - "Community over code" is not an excuse for negligent practices
>>>>>>>> that cause easily avoidable risk to our users. I will be very 
>>>>>>>> disappointed
>>>>>>>> if we choose that.
>>>>>>>> - I think having tooling that helps newcomers write better code by
>>>>>>>> default is better for the community too. Just like having automatic
>>>>>>>> formatting is better. Less to haggle about in review, etc.
>>>>>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>>>>>> - So far in almost every module I have fixed I discovered actual
>>>>>>>> new null errors. Many examples at [5].
>>>>>>>> - It is extremely easy to suppress the type checking. Almost all of
>>>>>>>> our classes have it suppressed already (I did this work, to allow 
>>>>>>>> existing
>>>>>>>> errors while protecting new code).
>>>>>>>> - Including the annotations in the shipped jars is an important
>>>>>>>> feature. Without this, users cannot write null-safe code themselves.
>>>>>>>>    * Reuven highlighted this: when methods are not annotated, we
>>>>>>>> have to use/implement workarounds. Actually Guava does use 
>>>>>>>> checkerframework
>>>>>>>> annotations [6] and the problem is that it requires its *input* to 
>>>>>>>> already
>>>>>>>> be non-null so actually you cannot even use it to convert nullable 
>>>>>>>> values
>>>>>>>> to non-nullable values.
>>>>>>>>    * Beam has its own [7] that is annotated, actually for yet
>>>>>>>> another reason: when Guava's checkNotNull fails, it throws NPE when it
>>>>>>>> should throw IllegalArgumentException. Guava's checkNotNull should not 
>>>>>>>> be
>>>>>>>> used for input validation!
>>>>>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in
>>>>>>>> user code. I wonder if there is something we can do about that. At the 
>>>>>>>> Java
>>>>>>>> level, if they are not on the classpath they should be ignored and not
>>>>>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in 
>>>>>>>> this
>>>>>>>> area :-)  [8].
>>>>>>>>
>>>>>>>> I understand the pain of longer compile times slowing people down.
>>>>>>>> That is actually a problem to be solved which does not require 
>>>>>>>> lowering our
>>>>>>>> standards of quality. How about we try moving it to a separate CI job 
>>>>>>>> and
>>>>>>>> see how it goes?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>> In my experience stories like Reuven's are much more frustrating in
>>>>>>>> a separate CI job because you find out quite late that your code has 
>>>>>>>> flaws.
>>>>>>>> Like when spotless fails, but much more work to fix, and would have 
>>>>>>>> been
>>>>>>>> prevented long ago if it were integrated into the compile.
>>>>>>>>
>>>>>>>
>>>>>>> I agree with this. I prefer to be able to detect on my computer that
>>>>>>> there are failures, and not have to wait for submission. The complaint 
>>>>>>> was
>>>>>>> that some people are experiencing trouble on their local machine 
>>>>>>> however,
>>>>>>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>>>>>>> out to opt in).
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>>>>>> [2]
>>>>>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>>>>>> [3]
>>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>>>>>> [4]
>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>>>>>> [5] https://github.com/apache/beam/pull/12284 and
>>>>>>>> https://github.com/apache/beam/pull/12162 and
>>>>>>>> [6]
>>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>>>>>> [7]
>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>>>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have some deeper concerns with the null checks. The fact that
>>>>>>>>> many libraries we use (including guava) don't always annotate their 
>>>>>>>>> methods
>>>>>>>>> forces a lot of workarounds. As a very simple example, the return 
>>>>>>>>> value
>>>>>>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>>>>>>> nullability checks don't know this. This and other similar cases 
>>>>>>>>> require
>>>>>>>>> constantly adding extra unnecessary null checks in the code just to 
>>>>>>>>> make
>>>>>>>>> the checker happy. There have been other cases where I haven't been 
>>>>>>>>> able to
>>>>>>>>> figure out a way to make the checker happy (often these seem to 
>>>>>>>>> involve
>>>>>>>>> using lambdas), and after 10-15 minutes of investigation have given 
>>>>>>>>> up and
>>>>>>>>> disabled the check.
>>>>>>>>>
>>>>>>>>> Now you might say that it's worth the extra pain and ugliness of
>>>>>>>>> writing "useless" code to ensure that we have null-safe code. However 
>>>>>>>>> I
>>>>>>>>> think this ignores a sociological aspect of software development. I 
>>>>>>>>> have a
>>>>>>>>> higher tolerance than many for this sort of pain, and I'm willing to 
>>>>>>>>> spend
>>>>>>>>> some time figuring out how to rewrite my code such that it makes the
>>>>>>>>> checker happy (even though often it forced me to write much more 
>>>>>>>>> awkward
>>>>>>>>> code). However even I have often found myself giving up and just 
>>>>>>>>> disabling
>>>>>>>>> the check. Many others will have less tolerance than me, and will 
>>>>>>>>> simply
>>>>>>>>> disable the checks. At that point we'll have a codebase littered with
>>>>>>>>> @SuppressWarnings("nullness"), which doesn't really get us where we 
>>>>>>>>> want to
>>>>>>>>> be. I've seen similar struggles in other codebases, and generally 
>>>>>>>>> having a
>>>>>>>>> static checker with too many false positives often ends up being 
>>>>>>>>> worse than
>>>>>>>>> having no checker.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ieme...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Even if I like the strictness for Null checking, I also think that
>>>>>>>>>> this is adding too much extra time for builds (that I noticed
>>>>>>>>>> locally
>>>>>>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>>>>>>> really an undesired side effect. For reference when you try to
>>>>>>>>>> auto
>>>>>>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>>>>>>> with C-A-v it generates some extra Checkers annotations like
>>>>>>>>>> @NonNull
>>>>>>>>>> and others even if the user isn't using them which is not
>>>>>>>>>> desirable.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kcwea...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> Big +1 for moving this to separate CI job. I really don't like
>>>>>>>>>> what annotations are currently added to the code we ship. Tools like 
>>>>>>>>>> Idea
>>>>>>>>>> add these annotations to code they generate when overriding classes 
>>>>>>>>>> and
>>>>>>>>>> that's very annoying. Users should not be exposed to internal tools 
>>>>>>>>>> like
>>>>>>>>>> nullability checking.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > I was only planning on moving this to a separate CI job. The
>>>>>>>>>> job would still be release blocking, so the same annotations would 
>>>>>>>>>> still be
>>>>>>>>>> required.
>>>>>>>>>> >
>>>>>>>>>> > I'm not sure which annotations you are concerned about. There
>>>>>>>>>> are two annotations involved with nullness checking, 
>>>>>>>>>> @SuppressWarnings and
>>>>>>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it 
>>>>>>>>>> shouldn't
>>>>>>>>>> be exposed to users at all. @Nullable is not just for internal 
>>>>>>>>>> tooling, it
>>>>>>>>>> also provides useful information about our APIs to users. The user 
>>>>>>>>>> should
>>>>>>>>>> not have to guess whether a method argument etc. can be null or not, 
>>>>>>>>>> and
>>>>>>>>>> for better or worse, these annotations are the standard way of 
>>>>>>>>>> expressing
>>>>>>>>>> that in Java.
>>>>>>>>>>
>>>>>>>>>

Re: Null checking in Beam

Reply via email to