Re: Null checking in Beam

Kenneth Knowles Wed, 31 Mar 2021 08:47:05 -0700

On Wed, Mar 31, 2021 at 8:40 AM Alexey Romanenko <[email protected]>
wrote:


> I believe we should update a contribution doc on this, since now it’s
> confusing  - you run “./gradlew :your:task:check” and it’s fine but then it
> fails on CI because of some missed Nullable checks.
>

I'll add to the contributor guide. I think it would be ideal for :check to
still include it, and only exclude it from the main :compileJava
:compileTestJava. I think this might require some more Gradle work.


> Also, I noticed that “*checkNotNull(yourVar)*” or "*checkState(yourVar !=
> null)*” won’t help, it will still complain. Only explicit “*if (yourVar
> != null)*” works. Is it correct or do I miss something?
>

This is a problem with Guava's annotations mentioned earlier in the thread.
They are annotated appropriate for dynamic checks that already agree with
the type. You need to use
org.apache.beam.sdk.util.Preconditions.checkNotNull to do a null check that
alters the type from nullable to not-null.

Kenn


>
> On 24 Mar 2021, at 01:24, Kyle Weaver <[email protected]> wrote:
>
> I moved the checker framework from opt-out to opt-in [1]. You can opt in
> by passing "-PenableCheckerFramework=true" to the Gradle invocation. The
> checker still runs on all Jenkins CI builds.
>
> [1] https://github.com/apache/beam/pull/14301
>
> On Wed, Mar 17, 2021 at 8:50 AM Kenneth Knowles <[email protected]> wrote:
>
>>
>>
>> On Wed, Mar 17, 2021 at 2:49 AM Jan Lukavský <[email protected]> wrote:
>>
>>> If there is no way to configure which annotations should be generated,
>>> then I'd be +1 for removing the checker to separated CI and adding an
>>> opt-in flag for the check when run locally.
>>>
>> Yes. If this answer is correct, then we are out of luck:
>> https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods
>>
>> A good followup to moving it to a separate CI job would be to attach a
>> full type checking run to the `:check` gradle task. This should be the best
>> place to attach all of our extended checks like spotbugs, spotless,
>> checkstyle, checkerframework. I rarely run `:check` myself (I think it is
>> currently broken in some ways) but it may be good to start.
>>
>> BTW the slowdown we need to solve is local builds, not CI runs. When it
>> was added the slowdown was almost completely prevented by caching. And now
>> that we disable it for test classes (where for some reason it was extra
>> slow) I wonder if it will speed up the main CI runs at all. So I would say
>> the first thing to do is to just disable it by default but enable it in the
>> Jenkins job builders.
>>
>> Kenn
>>
>> We need to solve the issue for dev@ as well, as the undesirable
>>> annotations are already digging their way to the code base:
>>>
>>> git/apache/beam$ git grep UnknownKeyFor
>>>
>>> Another strange thing is that it seems, that we are pulling the
>>> checkerframework as a runtime dependency (at least for some submodules).
>>> When I run `mvn dependency:tree` on one of my projects that uses maven I see
>>>
>>> [INFO] +- com.google.guava:guava:jar:30.1-jre:provided
>>> [INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
>>> [INFO] |  +-
>>> com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided
>>> [INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided
>>>
>>> which is fine, but when I add beam-sdks-java-extensions-euphoria it
>>> changes to
>>>
>>> [INFO] +-
>>> org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
>>> [INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile
>>>
>>> I'm not a gradle guru, so I cannot tell how this happens, there seems to
>>> be nothing special about euphoria in the gradle.
>>>
>>>  Jan
>>> On 3/16/21 7:12 PM, Kenneth Knowles wrote:
>>>
>>> I've asked on checkerframework users mailing list if they or any users
>>> have recommendations for the IntelliJ integration issue.
>>>
>>> It is worth noting that the default annotations inserted into the
>>> bytecode do add value: the @NonNull type annotations are default for
>>> checkerframework but not default for spotbugs. So having the default
>>> inserted enables downstream users to have betters spotbugs heuristic
>>> analysis. Further, the defaults can be adjusted by users so freezing them
>>> at the values we use them at is valuable in general across all tools.
>>>
>>> I think it makes sense to sacrifice these minor value-adds to keep
>>> things simple-looking for IntelliJ users.
>>>
>>> Kenn
>>>
>>> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <[email protected]>
>>> wrote:
>>>
>>>> Seems it is an FAQ with no solution:
>>>> https://checkerframework.org/manual/#faq-classfile-annotations
>>>>
>>>> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <[email protected]>
>>>> wrote:
>>>>
>>>>> Adding -PskipCheckerframework when releasing will solve it for users,
>>>>> but not for dev@.
>>>>>
>>>>> Making it off by default and a separate CI check that turns it on
>>>>> would solve it overall but has the downsides mentioned before.
>>>>>
>>>>> It would be very nice to simply have a flag to not insert default
>>>>> annotations.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <[email protected]> wrote:
>>>>>
>>>>>> I believe it is not a problem of Idea. At least I didn't notice
>>>>>> behavior like that with Guava, although Guava uses the framework as well.
>>>>>> Maybe there is a way to tune which annotations should be generated? Or
>>>>>> Guava handles that somehow differently?
>>>>>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>>>>>
>>>>>> I've also been annoyed at IntelliJ autogenenerating all these
>>>>>> annotations. I believe Kenn said that this was not the intention - maybe
>>>>>> there's an IntelliJ setting that would stop this from happening?
>>>>>>
>>>>>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <[email protected]> wrote:
>>>>>>
>>>>>>> I don't know the details of the checkerframework, but there seems a
>>>>>>> contradiction between what code is (currently) generated and what we
>>>>>>> therefore release and what actually the checkerframework states [1]:
>>>>>>>
>>>>>>> @UnknownKeyFor:
>>>>>>>
>>>>>>> Used internally by the type system; should never be written by a
>>>>>>> programmer.
>>>>>>>
>>>>>>> If this annotation is generated for overwritten methods, then I'd
>>>>>>> say, that it means we place a great burden to our users - either not 
>>>>>>> using
>>>>>>> autogenerated methods, or erase all the generated annotations 
>>>>>>> afterwards.
>>>>>>> Either way, that is not "polite" from Beam.
>>>>>>>
>>>>>>> What we should judge is not only a formal purity of code, but what
>>>>>>> stands on the other side is how usable APIs we provide. We should not 
>>>>>>> head
>>>>>>> for 100% pure code and sacrificing use comfort. Every check that leaks 
>>>>>>> to
>>>>>>> user code is at a price and we should not ignore that. No free lunch.
>>>>>>>
>>>>>>> From my point of view:
>>>>>>>
>>>>>>>  a) if a check does not modify the bytecode, it is fine and we can
>>>>>>> use it - we are absolutely free to use any tooling we agree on, if our
>>>>>>> users cannot be affected anyhow
>>>>>>>
>>>>>>>  b) if a tool needs to be leaked to user, it should be as small
>>>>>>> leakage as possible
>>>>>>>
>>>>>>>  c) if a check significantly affects compile performance, it should
>>>>>>> be possible to opt-out
>>>>>>>
>>>>>>> I think that our current setup violates all these three points.
>>>>>>>
>>>>>>> Moving the check to different CI is a possibility (a)), it would
>>>>>>> then require opt-in flag to be able to run the check locally. It would 
>>>>>>> also
>>>>>>> stop the leakage (if we would release code without this check).
>>>>>>>
>>>>>>> If we want to keep some annotations for user's benefit (which might
>>>>>>> be fine), it should be really limited to the bare minimum (e.g. only
>>>>>>> @Nullable for method arguments and return values, possibly more, I don't
>>>>>>> know if and how we can configure that). Definitely not @UnknownKeyFor, 
>>>>>>> that
>>>>>>> is simply internal to the checker. We should then have opt-out flag for
>>>>>>> local development before committing changes.
>>>>>>>
>>>>>>>  Jan
>>>>>>>
>>>>>>> [1]
>>>>>>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>>>>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I will be blunt about my opinions about the general issue:
>>>>>>>>>
>>>>>>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>>>>>>    * They have been since 2003 at the latest [1] (this is when the
>>>>>>>>> types were hacked into Java - the foundation dates back to the 70s or
>>>>>>>>> earlier)
>>>>>>>>>
>>>>>>>>
>>>>>>>> Huh - Fahndrich tried to hire me once to work on a project called
>>>>>>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in 
>>>>>>>> the
>>>>>>>> citations!
>>>>>>>>
>>>>>>>
>>>>>>> Umm, I was confusing names. Fahndrich is actually a former coworker
>>>>>>> at Google :)
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>    * Checkerframework is a _pluggable_ system where that nullness
>>>>>>>>> type system is a "hello, world" level demo, since 2008 at the latest 
>>>>>>>>> [2].
>>>>>>>>>    * Our users should know this and judge us accordingly.
>>>>>>>>>
>>>>>>>>> - Checkerframework should be thought of and described as type
>>>>>>>>> checking, because it is. It is not heuristic nor approximate.
>>>>>>>>> - If your code was unclear about whether something could be null,
>>>>>>>>> it was probably unclear to a person reading it also, and very likely 
>>>>>>>>> to
>>>>>>>>> have actual bugs.
>>>>>>>>> - APIs that accept a lot of nullable parameters are, generally
>>>>>>>>> speaking, bad APIs. They are hard to use correctly, less readable, 
>>>>>>>>> and very
>>>>>>>>> likely to cause actual bugs. You are forcing your users to deal with
>>>>>>>>> accidental complexity you left behind.
>>>>>>>>>   * Corollary to the above two points: Almost all the time, the
>>>>>>>>> changes to clearify nullness make your code better, more readable, 
>>>>>>>>> easier
>>>>>>>>> for users or editors.
>>>>>>>>> - It is true that there is a learning curve to programming in this
>>>>>>>>> way.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with the above in a closed system. However as mentioned,
>>>>>>>> some of the APIs we use suffer from this.
>>>>>>>>
>>>>>>>> In a previous life, I saw up close an effort to add such analysis
>>>>>>>> to a large codebase. Two separate tools called Prefix and Prefast were 
>>>>>>>> used
>>>>>>>> (the difference between the two is actually quite interesting, but not
>>>>>>>> relevant here). However in order to make this analysis useful, there 
>>>>>>>> was a
>>>>>>>> massive effort to properly annotate the entire codebase, including all
>>>>>>>> libraries used. This isn't a perfect example - this was a C++ codebase
>>>>>>>> which is much harder to analyze, and these tools identified far more 
>>>>>>>> than
>>>>>>>> simply nullness errors (resource leaks, array indices, proper string 
>>>>>>>> null
>>>>>>>> termination, exception behavior, etc.). However the closer we can get 
>>>>>>>> to
>>>>>>>> properly annotating the transitive closure of our dependencies, the 
>>>>>>>> better
>>>>>>>> this framework will work.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> - There are certainly common patterns in Java that do not work
>>>>>>>>> very well, and suppression is sometimes the best option.
>>>>>>>>>    * Example: JUnit's @Setup and @Test conventions do not work
>>>>>>>>> very well and it is not worth the effort to make them work. This is
>>>>>>>>> actually because if it were "normal code" it would be bad code. There 
>>>>>>>>> are
>>>>>>>>> complex inter-method dependencies enforced only by convention. This
>>>>>>>>> matters: sometimes a JUnit test class is called from another class, 
>>>>>>>>> used as
>>>>>>>>> "normal code". This does go wrong in practice. Plain old JUnit test 
>>>>>>>>> cases
>>>>>>>>> frequently go wrong as well.
>>>>>>>>>
>>>>>>>>> And here is my opinion when it comes to Beam:
>>>>>>>>>
>>>>>>>>> - "Community over code" is not an excuse for negligent practices
>>>>>>>>> that cause easily avoidable risk to our users. I will be very 
>>>>>>>>> disappointed
>>>>>>>>> if we choose that.
>>>>>>>>> - I think having tooling that helps newcomers write better code by
>>>>>>>>> default is better for the community too. Just like having automatic
>>>>>>>>> formatting is better. Less to haggle about in review, etc.
>>>>>>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>>>>>>> - So far in almost every module I have fixed I discovered actual
>>>>>>>>> new null errors. Many examples at [5].
>>>>>>>>> - It is extremely easy to suppress the type checking. Almost all
>>>>>>>>> of our classes have it suppressed already (I did this work, to allow
>>>>>>>>> existing errors while protecting new code).
>>>>>>>>> - Including the annotations in the shipped jars is an important
>>>>>>>>> feature. Without this, users cannot write null-safe code themselves.
>>>>>>>>>    * Reuven highlighted this: when methods are not annotated, we
>>>>>>>>> have to use/implement workarounds. Actually Guava does use 
>>>>>>>>> checkerframework
>>>>>>>>> annotations [6] and the problem is that it requires its *input* to 
>>>>>>>>> already
>>>>>>>>> be non-null so actually you cannot even use it to convert nullable 
>>>>>>>>> values
>>>>>>>>> to non-nullable values.
>>>>>>>>>    * Beam has its own [7] that is annotated, actually for yet
>>>>>>>>> another reason: when Guava's checkNotNull fails, it throws NPE when it
>>>>>>>>> should throw IllegalArgumentException. Guava's checkNotNull should 
>>>>>>>>> not be
>>>>>>>>> used for input validation!
>>>>>>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations
>>>>>>>>> in user code. I wonder if there is something we can do about that. At 
>>>>>>>>> the
>>>>>>>>> Java level, if they are not on the classpath they should be ignored 
>>>>>>>>> and not
>>>>>>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions 
>>>>>>>>> in this
>>>>>>>>> area :-)  [8].
>>>>>>>>>
>>>>>>>>> I understand the pain of longer compile times slowing people down.
>>>>>>>>> That is actually a problem to be solved which does not require 
>>>>>>>>> lowering our
>>>>>>>>> standards of quality. How about we try moving it to a separate CI job 
>>>>>>>>> and
>>>>>>>>> see how it goes?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>> In my experience stories like Reuven's are much more frustrating
>>>>>>>>> in a separate CI job because you find out quite late that your code 
>>>>>>>>> has
>>>>>>>>> flaws. Like when spotless fails, but much more work to fix, and would 
>>>>>>>>> have
>>>>>>>>> been prevented long ago if it were integrated into the compile.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with this. I prefer to be able to detect on my computer
>>>>>>>> that there are failures, and not have to wait for submission. The 
>>>>>>>> complaint
>>>>>>>> was that some people are experiencing trouble on their local machine
>>>>>>>> however, so it seems reasonable to add an opt-out flag (though I would
>>>>>>>> prefer opt out to opt in).
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>>>>>>>> [2]
>>>>>>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>>>>>>> [3]
>>>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>>>>>>> [4]
>>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>>>>>>> [5] https://github.com/apache/beam/pull/12284 and
>>>>>>>>> https://github.com/apache/beam/pull/12162 and
>>>>>>>>> [6]
>>>>>>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>>>>>>> [7]
>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>>>>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I have some deeper concerns with the null checks. The fact that
>>>>>>>>>> many libraries we use (including guava) don't always annotate their 
>>>>>>>>>> methods
>>>>>>>>>> forces a lot of workarounds. As a very simple example, the return 
>>>>>>>>>> value
>>>>>>>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>>>>>>>> nullability checks don't know this. This and other similar cases 
>>>>>>>>>> require
>>>>>>>>>> constantly adding extra unnecessary null checks in the code just to 
>>>>>>>>>> make
>>>>>>>>>> the checker happy. There have been other cases where I haven't been 
>>>>>>>>>> able to
>>>>>>>>>> figure out a way to make the checker happy (often these seem to 
>>>>>>>>>> involve
>>>>>>>>>> using lambdas), and after 10-15 minutes of investigation have given 
>>>>>>>>>> up and
>>>>>>>>>> disabled the check.
>>>>>>>>>>
>>>>>>>>>> Now you might say that it's worth the extra pain and ugliness of
>>>>>>>>>> writing "useless" code to ensure that we have null-safe code. 
>>>>>>>>>> However I
>>>>>>>>>> think this ignores a sociological aspect of software development. I 
>>>>>>>>>> have a
>>>>>>>>>> higher tolerance than many for this sort of pain, and I'm willing to 
>>>>>>>>>> spend
>>>>>>>>>> some time figuring out how to rewrite my code such that it makes the
>>>>>>>>>> checker happy (even though often it forced me to write much more 
>>>>>>>>>> awkward
>>>>>>>>>> code). However even I have often found myself giving up and just 
>>>>>>>>>> disabling
>>>>>>>>>> the check. Many others will have less tolerance than me, and will 
>>>>>>>>>> simply
>>>>>>>>>> disable the checks. At that point we'll have a codebase littered with
>>>>>>>>>> @SuppressWarnings("nullness"), which doesn't really get us where we 
>>>>>>>>>> want to
>>>>>>>>>> be. I've seen similar struggles in other codebases, and generally 
>>>>>>>>>> having a
>>>>>>>>>> static checker with too many false positives often ends up being 
>>>>>>>>>> worse than
>>>>>>>>>> having no checker.
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> Even if I like the strictness for Null checking, I also think
>>>>>>>>>>> that
>>>>>>>>>>> this is adding too much extra time for builds (that I noticed
>>>>>>>>>>> locally
>>>>>>>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>>>>>>>> really an undesired side effect. For reference when you try to
>>>>>>>>>>> auto
>>>>>>>>>>> complete some method signatures on IntelliJ on downstream
>>>>>>>>>>> projects
>>>>>>>>>>> with C-A-v it generates some extra Checkers annotations like
>>>>>>>>>>> @NonNull
>>>>>>>>>>> and others even if the user isn't using them which is not
>>>>>>>>>>> desirable.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> Big +1 for moving this to separate CI job. I really don't
>>>>>>>>>>> like what annotations are currently added to the code we ship. 
>>>>>>>>>>> Tools like
>>>>>>>>>>> Idea add these annotations to code they generate when overriding 
>>>>>>>>>>> classes
>>>>>>>>>>> and that's very annoying. Users should not be exposed to internal 
>>>>>>>>>>> tools
>>>>>>>>>>> like nullability checking.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > I was only planning on moving this to a separate CI job. The
>>>>>>>>>>> job would still be release blocking, so the same annotations would 
>>>>>>>>>>> still be
>>>>>>>>>>> required.
>>>>>>>>>>> >
>>>>>>>>>>> > I'm not sure which annotations you are concerned about. There
>>>>>>>>>>> are two annotations involved with nullness checking, 
>>>>>>>>>>> @SuppressWarnings and
>>>>>>>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it 
>>>>>>>>>>> shouldn't
>>>>>>>>>>> be exposed to users at all. @Nullable is not just for internal 
>>>>>>>>>>> tooling, it
>>>>>>>>>>> also provides useful information about our APIs to users. The user 
>>>>>>>>>>> should
>>>>>>>>>>> not have to guess whether a method argument etc. can be null or 
>>>>>>>>>>> not, and
>>>>>>>>>>> for better or worse, these annotations are the standard way of 
>>>>>>>>>>> expressing
>>>>>>>>>>> that in Java.
>>>>>>>>>>>
>>>>>>>>>>
>

Re: Null checking in Beam

Reply via email to