Re: Patterns and nulls

Brian Goetz Tue, 21 Aug 2018 11:57:37 -0700

Returning to this topic…

As mentioned in the original thread, some of what was in here went toofar. I think we’re comfortable saying:


 * A /type pattern/, on its own, should coincide with |instanceof|,
   meaning it never matches null
 * A /var pattern/ is just a type pattern with the type supplied by
   inference

It still leaves us with two choices for how to write a pattern thatmatches any |Box|, including |Box(null)|.


1. Just write |Box(Object? o)| if you want all boxes, or write
   |Box(Object o)| if you mean a box containing a non-null.
2. Adjust the rules for nested patterns to treat a total
   (type-restating) type pattern specially, so |Box(Frog f)| would only
   match boxes containing non-null frogs, but |Box(Object o)| would
   match all boxes.

The former is more principled, as it lets you say what you mean in astraightforward way. The latter is more irregular, but might be moreinline with user intuition. I still worry that people will repeatedlycut themselves on the sharp edge of |Box(Object o)| not matching all boxes.

Under either of these rule sets, we can use |instanceof| as our matchoperator.

So I think it comes down to a simple decision about whether we want todistort nested total (type-restating) type patterns to be null-friendlyor null-hostile.


On 3/14/2018 12:58 PM, Brian Goetz wrote:

In the message "More on patterns, generics, null, and primitives",Gavin outlines how these constructs will be treated in patternmatching. This mail is a refinement of that, specifically, to refinehow nulls are treated.
Rambling Background Of Why This Is A Problem At All
---------------------------------------------------
Nulls will always be a source of corner cases and surprises, so thebest we can likely do is move the surprises around to coincide withexisting surprise modes. One of the existing surprise modes is thatswitches on reference types (boxes, strings, and enums) currentlyalways NPE when passed a null. You could characterize switch's currenttreatment of null as "La la la can't hear you la la la." (I thinkthis decision was mostly made by frog-boiling; in Java 1.0, there wereno switches on reference types, so it was not an issue; when switcheson boxes was added, it was done by appeal to auto-unboxing, whichthrows on null, and null enums are rare enough that no one felt it wasimportant enough to do something different for them. Then when weadded string switch in 7, we were already mostly sliding the slipperyslope of past precedent.)
The "la la la" approach has gotten us pretty far, but I think finallyruns out of gas when we have nested patterns. It might be OK to NPEwhen x = null here:
    switch (x) {
        case String: ...
        case Integer: ...
        default: ...
    }

but it is certainly not OK to NPE when b = new Box(null):

    switch (b) {
        case Box(String s): ...
        case Box(Integer i): ...
        case Box(Object o): ...
    }
since `Box(null)` is a perfectly reasonable box. (Which of thesepatterns matches `Box(null)` is a different story, see below.) Soproblem #1 with is that we need a way to match nulls in nestedpatterns; having nested patterns throw whenever any intermediatebinding produces null would be crazy. So, we have to deal with nullsin this way. It seems natural, therefore, to be able to confront itdirectly:
    case Box(null): ...
which is just an ordinary nested pattern, where our target matches`Box(var x)` and further x matches null. Which means `x matches null`need to be a thing, even if switch is hostile to nulls.
But if you pull on this string a bit more, we'd also like to do thesame at the top level, because we'd like to be able to refactor
    switch (b) {
        case Box(null): ...
        case Box(Candy): ...
        case Box(Object): ...
    }

into

    switch (b) {
        case Box(var x):
            switch (x) {
                case null: ...
case Candy: ...
case Object: ...
            }
    }
with no subtle semantics changes. I think this is what users willexpect, and cutting them on sharp edges here wouldn't be doing themfavors.
Null and Type Patterns
----------------------
The previous iteration outlined in Gavin's mail was motivated by asensible goal, but I think we took it a little too literally. Which isthat if I have a `Box(null)`, it should match the following:
    case Box(var x):
because it would be weird if `var x` in a nested context really meant"everything but null." This led us to the position that
    case Box(Object o):
should also match `Box(null)`, because `var` is just type inference,and the compiler infers `Object` here from the signature of the `Box`deconstructor. So `var` and the type that gets inferred should betreated the same. (Note that Scala departs from this, and the resultsare pretty confusing.)
You might convince yourself that `Box(Object)` not matching`Box(null)` is not a problem, just add a case to handle null, with anOR pattern (aka non-harmful fallthrough):
    case Box(null): // fall through
    case Box(Object): ...
But, this only works in the simple case. What if my Box deconstructorhad four binding variables:
    case Box(P, Q, R, S):

Now, to capture the same semantics, you need four more cases:

    case Box(null, Q, R, S): // fall through
    case Box(P, null, R, S):// fall through
    case Box(P, Q, null, S): // fall through
    case Box(P, Q, R, null): // fall through
    case Box(P, Q, R, S):
But wait, it gets worse, since if P and friends have bindingvariables, and the null pattern does not, the binding variables willnot be DA and therefore not be usable. And if we graft bindingvariables onto constant patterns, we have a potential typing problem,since the type of merged binding variables in OR patterns shouldmatch. So this is a tire fire, let's back away slowly.
So, we want at least some type patterns to match null, at least innested contexts. Got it.
This led us to: a type pattern `T t` should match null. But clearly,in the switch
    switch (aString) {
        case String s: ...
    }
it NPEs (since that's what it does today.) So we moved the nullhostility to `switch`, which involved an analysis of whether `casenull` was present. As Kevin pointed out, that was pretty confusingfor the users to keep track of. So that's not so good.
Also not so good: if type patterns match null, then the dominanceorder rule says you can't put a `case null` arm after a type patternarm, because the `case null` will be dead. (Just like you can't catch`IOException` after catching `Throwable`.) Which deprived case nullof most of its remaining usefulness, which is: lump null in with thedefault. If users want to use `case null`, they most likely want this:
    switch (o) {
        case A: ...
        case B: ...
        case null: // fall through
        default:
            // deal with unexpected values
    }
If we can't do that -- which the latest iteration said we can't -- itspretty useless. So, we got something wrong with type patterns too. Tricky buggers, these nulls!
Some Problems With the Current Plan
-----------------------------------
The current plan, even though it came via a sensible path, has lots ofproblems. Including:
- Its hard to reason about which switches throw on null and whichdon't. (This will never be easy, but we can make it less hard.) - We have asymmetries between nested and non-nested patterns; if weunroll a nested pattern to a nested switch, the semantics shift subtlyout from under us. - There's no way to say "default including null", which is whatpeople would actually want to do if they had explicit control overnulls. Having `String s` match null means our ordering rules forcethe null case too early, depriving us of the ability to lump it inwith another case.
Further, while the intent of `Box(var x)` matches `Box(null)` wasright, and that led us to `Box(Object)` matches `Box(null)`, we didn'tpull this string to the end. So let's break some assumptions andstart over.
Let's assume we have the following declarations:

    record Box(Object);
    Object o;
    String s;
    Box b;
Implicitly, `Box` has a deconstruction pattern whose signature is`Box(out Object o)`.
What will users expect on the following?

    Box b = new Box(null);
    switch (b) {
        case Box(Candy x): ...
        case Box(Frog f): ...
        case Box(Object o): ...
    }

There are four non-ridiculous possibilities:
 - NPE
 - Match none
 - Match Box(Candy)
 - Match Box(Object)
I argued above why NPE is undesirable; I think matching none of themwould also be pretty surprising, since `Box(null)` is a perfectlyreasonable element of the value set decribed by the pattern`Box(Object)`. If all type patterns match null, we'd match`Box(Candy)` -- but that's pretty weird and arbitrary, and probablynot what the user expects. It also means -- and this is a serioussmell -- that we couldn't freely reorder the independent cases`Box(Candy)` and `Box(Frog)` without subtly altering behavior. Yuck!
So the only reasonable outcome is that it matches `Box(Object)`. We'll need a credible theory why we bypass the candy and the frogbuckets, but I think this is what the user will expect --`Box(Object)` is our catch-all bucket.
A Credible Theory
-----------------

Recall that matching a nested pattern `x matches Box(P)` means:

    x matches Box(var alpha) && alpha matches P
The theory by which we can reasonably claim that `Box(Object)` matches`Box(null)` is that the nested pattern `Object` is _total_ on the typeof its target (alpha), and therefore can be statically deemed to matchwithout additional dynamic checks. In
        case Box(Candy x): ...
        case Box(Frog f): ...
        case Box(Object o): ...
the first two cases require additional dynamic type tests (instanceofCandy / Frog), but the latter, if the target is a `Box` at all,requires no further dynamic testing. So we can _define_ `T t` to mean:
    match(T t, e : U) === U <: T ? true : e instanceof U
In other words, a total type pattern matches null, but a partial typepattern does not. That's great for the type system weenies, but doesit help the users? I claim it does. It means that in:
    Box b = new Box(null);
    switch (b) {
        case Box(Candy x): ...
        case Box(Frog f): ...
        case Box(Object o): ...
    }
We match `Box(Object)`, which is the catch-all `Box` handler. We canfreely reorder the first two cases, because they're unordered bydominance, but we can't reorder either of them with `Box(Object)`,because that would create a dead case arm. `Box(var x)` and `Box(Tx)` mean the same thing when `T` is the type that inference produces.
So `Box(Candy)` selects all boxes known to contain candy; `Box(Frog)`all boxes known to contain frogs; `Box(null)` selects a box containingnull, and `Box(_)` or `Box(var x)` or `Box(Object o)` selects all boxes.
Further, we can unroll the above to:

    Box b = new Box(null);
    switch (b) {
        case Box(var x):
switch (x) {
case Candy c: ...
case Frog f: ...
case Object o: ...
            }
    }
and it means _the same thing_; the nulls flow into the `Object` catchbasin, and I can still freely recorder the Candy/Frog cases. Whew.This feels like we're getting somewhere.
We can also now flow the `case null` down to where it falls throughinto the "everything else" bucket, because type patterns no longermatch nulls. If specified at all, this is probably where the usermost wants to put it.
Note also that the notion of a "total pattern" (one whoseapplicability, possibly modulo null, can be determined statically)comes up elsewhere too. We talked about a let-bind statement:
   let Point(var x, var y) = p
In order for the compiler to know that an `else` is not required on alet-bind, the pattern has to be total on the static type of thetarget. So this notion of totality is a useful one.
Where totality starts to feel uncomfortable is the fact that whilenull _matches_ `Object o`, it is not `instanceof Object`. More onthis later.
This addresses all the problems we stated above, so what's the problem?

Default becomes legacy
----------------------
The catch is that the irregularity of `default` becomes even moreproblematic. The cure is we give `default` a gold watch, thank it forits services, and grant it "Keyword Emeritus" status.
What's wrong with default? First, it's syntactically irregular. It'snot a pattern, so doesn't easily admit nesting or binding variables. And second, its semantically irregular; it means "everything else (butnot null!)" Which makes it a poor catch-all. We'd like for ourcatch-all case -- the one that dominates all other possible cases --to catch everything. We thought we wanted `default` to be equivalentto a total pattern, but default is insufficiently total.
So, let's define a _constant switch_ as one whose target is theexisting constant types (primitives, their boxes, strings, and enums)and whose labels are all constants (the latter condition might not beneeded). In a constant switch, retcon default to mean "all theconstants I've not explicitly enumerated, except null." (If you wantto flow nulls into the default bin too, just add an explicit `casenull` to fall into default, _or_ replace `default` with a totalpattern.) We act as if that constant switches have an implicit "casenull: NPE" _at the bottom_. If you don't handle null explicitly (atotal pattern counts as handling it explicitly), you fall into thatbucket.
Then, we _ban_ default in non-constant switches. So if you wantpatterns, swap your old deficient `default` for new shiny totalpatterns, which are a better default, and are truly exhaustive (ratherthan modulo-null exhaustive). If we can do a little more to expressthe intention of exhaustiveness for statement switches (which are notrequired to be exhaustive), this gives us a path to "switches neverthrow NPE if you follow XYZ rules."
There's more work to do here to get to this statically-provablenull-safe switch future, but I think this is a very positivedirection. (Of course, we can't prevent NPEs from people matchingagainst `Object o` and then dereferencing o.)
Instanceof becomes instanceof
-----------------------------
The other catch is that we can't use `instanceof` to be the spellingof our `matches` operator, because it conflicts with existing`instanceof` treatment of nulls. I think that's OK; `instanceof` is alow-level primitive; matching is a high-level construct definedpartially in terms of instanceof.

Re: Patterns and nulls

Reply via email to