Re: Null checking in Beam

Reuven Lax Tue, 16 Mar 2021 09:22:33 -0700

I've also been annoyed at IntelliJ autogenenerating all these annotations.
I believe Kenn said that this was not the intention - maybe there's an
IntelliJ setting that would stop this from happening?


On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz> wrote:

> I don't know the details of the checkerframework, but there seems a
> contradiction between what code is (currently) generated and what we
> therefore release and what actually the checkerframework states [1]:
>
> @UnknownKeyFor:
>
> Used internally by the type system; should never be written by a
> programmer.
>
> If this annotation is generated for overwritten methods, then I'd say,
> that it means we place a great burden to our users - either not using
> autogenerated methods, or erase all the generated annotations afterwards.
> Either way, that is not "polite" from Beam.
>
> What we should judge is not only a formal purity of code, but what stands
> on the other side is how usable APIs we provide. We should not head for
> 100% pure code and sacrificing use comfort. Every check that leaks to user
> code is at a price and we should not ignore that. No free lunch.
>
> From my point of view:
>
>  a) if a check does not modify the bytecode, it is fine and we can use it
> - we are absolutely free to use any tooling we agree on, if our users
> cannot be affected anyhow
>
>  b) if a tool needs to be leaked to user, it should be as small leakage as
> possible
>
>  c) if a check significantly affects compile performance, it should be
> possible to opt-out
>
> I think that our current setup violates all these three points.
>
> Moving the check to different CI is a possibility (a)), it would then
> require opt-in flag to be able to run the check locally. It would also stop
> the leakage (if we would release code without this check).
>
> If we want to keep some annotations for user's benefit (which might be
> fine), it should be really limited to the bare minimum (e.g. only @Nullable
> for method arguments and return values, possibly more, I don't know if and
> how we can configure that). Definitely not @UnknownKeyFor, that is simply
> internal to the checker. We should then have opt-out flag for local
> development before committing changes.
>
>  Jan
>
> [1]
> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
> On 3/16/21 8:33 AM, Reuven Lax wrote:
>
>
>
> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com> wrote:
>
>>
>>
>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <k...@apache.org> wrote:
>>
>>> I will be blunt about my opinions about the general issue:
>>>
>>> - NullPointerExceptions (and similar) are a solved problem.
>>>    * They have been since 2003 at the latest [1] (this is when the types
>>> were hacked into Java - the foundation dates back to the 70s or earlier)
>>>
>>
>> Huh - Fahndrich tried to hire me once to work on a project called
>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
>> citations!
>>
>
> Umm, I was confusing names. Fahndrich is actually a former coworker at
> Google :)
>
>
>>
>>
>>>    * Checkerframework is a _pluggable_ system where that nullness type
>>> system is a "hello, world" level demo, since 2008 at the latest [2].
>>>    * Our users should know this and judge us accordingly.
>>>
>>> - Checkerframework should be thought of and described as type checking,
>>> because it is. It is not heuristic nor approximate.
>>> - If your code was unclear about whether something could be null, it was
>>> probably unclear to a person reading it also, and very likely to have
>>> actual bugs.
>>> - APIs that accept a lot of nullable parameters are, generally speaking,
>>> bad APIs. They are hard to use correctly, less readable, and very likely to
>>> cause actual bugs. You are forcing your users to deal with accidental
>>> complexity you left behind.
>>>   * Corollary to the above two points: Almost all the time, the changes
>>> to clearify nullness make your code better, more readable, easier for users
>>> or editors.
>>> - It is true that there is a learning curve to programming in this way.
>>>
>>
>> I agree with the above in a closed system. However as mentioned, some of
>> the APIs we use suffer from this.
>>
>> In a previous life, I saw up close an effort to add such analysis to a
>> large codebase. Two separate tools called Prefix and Prefast were used (the
>> difference between the two is actually quite interesting, but not relevant
>> here). However in order to make this analysis useful, there was a massive
>> effort to properly annotate the entire codebase, including all libraries
>> used. This isn't a perfect example - this was a C++ codebase which is much
>> harder to analyze, and these tools identified far more than simply nullness
>> errors (resource leaks, array indices, proper string null termination,
>> exception behavior, etc.). However the closer we can get to properly
>> annotating the transitive closure of our dependencies, the better this
>> framework will work.
>>
>>
>>
>>> - There are certainly common patterns in Java that do not work very
>>> well, and suppression is sometimes the best option.
>>>    * Example: JUnit's @Setup and @Test conventions do not work very well
>>> and it is not worth the effort to make them work. This is actually because
>>> if it were "normal code" it would be bad code. There are complex
>>> inter-method dependencies enforced only by convention. This matters:
>>> sometimes a JUnit test class is called from another class, used as "normal
>>> code". This does go wrong in practice. Plain old JUnit test cases
>>> frequently go wrong as well.
>>>
>>> And here is my opinion when it comes to Beam:
>>>
>>> - "Community over code" is not an excuse for negligent practices that
>>> cause easily avoidable risk to our users. I will be very disappointed if we
>>> choose that.
>>> - I think having tooling that helps newcomers write better code by
>>> default is better for the community too. Just like having automatic
>>> formatting is better. Less to haggle about in review, etc.
>>> - A simple search reveals about 170 bugs that we know of [4].
>>> - So far in almost every module I have fixed I discovered actual new
>>> null errors. Many examples at [5].
>>> - It is extremely easy to suppress the type checking. Almost all of our
>>> classes have it suppressed already (I did this work, to allow existing
>>> errors while protecting new code).
>>> - Including the annotations in the shipped jars is an important feature.
>>> Without this, users cannot write null-safe code themselves.
>>>    * Reuven highlighted this: when methods are not annotated, we have to
>>> use/implement workarounds. Actually Guava does use checkerframework
>>> annotations [6] and the problem is that it requires its *input* to already
>>> be non-null so actually you cannot even use it to convert nullable values
>>> to non-nullable values.
>>>    * Beam has its own [7] that is annotated, actually for yet another
>>> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
>>> IllegalArgumentException. Guava's checkNotNull should not be used for input
>>> validation!
>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in user
>>> code. I wonder if there is something we can do about that. At the Java
>>> level, if they are not on the classpath they should be ignored and not
>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>>> area :-)  [8].
>>>
>>> I understand the pain of longer compile times slowing people down. That
>>> is actually a problem to be solved which does not require lowering our
>>> standards of quality. How about we try moving it to a separate CI job and
>>> see how it goes?
>>>
>>>
>>
>>> In my experience stories like Reuven's are much more frustrating in a
>>> separate CI job because you find out quite late that your code has flaws.
>>> Like when spotless fails, but much more work to fix, and would have been
>>> prevented long ago if it were integrated into the compile.
>>>
>>
>> I agree with this. I prefer to be able to detect on my computer that
>> there are failures, and not have to wait for submission. The complaint was
>> that some people are experiencing trouble on their local machine however,
>> so it seems reasonable to add an opt-out flag (though I would prefer opt
>> out to opt in).
>>
>>
>>>
>>> Kenn
>>>
>>> [1]
>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>>> [2]
>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>> [3]
>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>> [4]
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>> [5] https://github.com/apache/beam/pull/12284 and
>>> https://github.com/apache/beam/pull/12162 and
>>> [6]
>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>> [7]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>>
>>>
>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> I have some deeper concerns with the null checks. The fact that many
>>>> libraries we use (including guava) don't always annotate their methods
>>>> forces a lot of workarounds. As a very simple example, the return value
>>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>>> nullability checks don't know this. This and other similar cases require
>>>> constantly adding extra unnecessary null checks in the code just to make
>>>> the checker happy. There have been other cases where I haven't been able to
>>>> figure out a way to make the checker happy (often these seem to involve
>>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>>> disabled the check.
>>>>
>>>> Now you might say that it's worth the extra pain and ugliness of
>>>> writing "useless" code to ensure that we have null-safe code. However I
>>>> think this ignores a sociological aspect of software development. I have a
>>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>>> some time figuring out how to rewrite my code such that it makes the
>>>> checker happy (even though often it forced me to write much more awkward
>>>> code). However even I have often found myself giving up and just disabling
>>>> the check. Many others will have less tolerance than me, and will simply
>>>> disable the checks. At that point we'll have a codebase littered with
>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>>> be. I've seen similar struggles in other codebases, and generally having a
>>>> static checker with too many false positives often ends up being worse than
>>>> having no checker.
>>>>
>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ieme...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Even if I like the strictness for Null checking, I also think that
>>>>> this is adding too much extra time for builds (that I noticed locally
>>>>> when enabled) and also I agree with Jan that the annotations are
>>>>> really an undesired side effect. For reference when you try to auto
>>>>> complete some method signatures on IntelliJ on downstream projects
>>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>>> and others even if the user isn't using them which is not desirable.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kcwea...@google.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Big +1 for moving this to separate CI job. I really don't like what
>>>>> annotations are currently added to the code we ship. Tools like Idea add
>>>>> these annotations to code they generate when overriding classes and that's
>>>>> very annoying. Users should not be exposed to internal tools like
>>>>> nullability checking.
>>>>> >
>>>>> >
>>>>> > I was only planning on moving this to a separate CI job. The job
>>>>> would still be release blocking, so the same annotations would still be
>>>>> required.
>>>>> >
>>>>> > I'm not sure which annotations you are concerned about. There are
>>>>> two annotations involved with nullness checking, @SuppressWarnings and
>>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>>> also provides useful information about our APIs to users. The user should
>>>>> not have to guess whether a method argument etc. can be null or not, and
>>>>> for better or worse, these annotations are the standard way of expressing
>>>>> that in Java.
>>>>>
>>>>

Re: Null checking in Beam

Reply via email to