Re: Next up for patterns: type patterns in switch

Brian Goetz Mon, 10 Aug 2020 14:05:10 -0700

Some further color on this, to characterize why all the angst overmatching Box(null) seems mostly like a collective "bleah, different isscary" freakout...

Case 1. The Box domain rejects nulls in the ctor. Then it doesn'tmatter what we do; all the schemes discussed for `Box(Object o)` will dothe same thing.

Case 2. The Box domain loves nulls! Boxes can contain nulls, and usersshould always expect to find a null in a box; not doing so is usingboxes wrong.

In that case, `case Box(Object o)` should surely match `Box(null)`,since its an unremarkable element of the Box domain. Here, though,people get nervous: "if we bind o to null, a careless users might NPE!" But that's likely to happen anyway -- and should.

Suppose we didn't have deconstruction patterns, and instead the userwrites:


    case Box b: ...

There's no question this matches Box(null). And, the same code thecareless programmer might write with `Box(var o)`, they're going towrite almost exactly the same thing here:


    case Box b:
        Object boxContents = b.contents(); // returns null, no problem
        boxContents.foo()                  // Same NPE

In this case, we do the users no favors -- actually, we do anti-favors-- by "hiding" Box(null) from the domain, on the off chance that theywill screw it up. If Box is a null-loving domain, then clients need towrite null-aware code, and hiding the nulls doesn't help.

Further, this example shows another element from our refactoringcatalog: users should be able to freely refactor:


    case Foo target:
        Object component = target.component();

with

    case Foo(Object component) target: ...

without changing the semantics. But if `Foo(Object)` doesn't match`Foo(null)`, that's yet another sharp edge.

Essentially, I think the "never match nulls" crowd just really hatesnulls and wants them to go away. But they are not going away, and we dono one any favors by hiding our heads in the sand.




On 8/10/2020 1:57 PM, Brian Goetz wrote:

There seems to be an awful lot of confusion about the motivation forthe nullity proposal, so let me step back and address this from firstprinciples.
Let's factor away the null-tolerance of the constructs (switch andinstanceof) from what patterns mean, and then we can return to how, ifnecessary, to resolve any mismatches. We'll do this by defining whatit means for a target to match a pattern, and only then define thesemantics of the pattern-aware constructs in terms of that.
Let me also observe that some people, in their belief that `null` wasa mistake, tend to have a latent hostility to null, and therefore tendto want new features to be at least as null-hostile as the mostnull-hostile of old features. (A good example is streams; it wassuggested (by some of the same people) that it should be an error forstreams to have null elements. And we considered this briefly -- andconcluded this would have been a terrible idea! The lesson of thatinvestigation was that the desire to "fix" the null mistake bypatching individual holes is futile, and tends to lead to worseresults. Instead, being null-agnostic was the right move for streams.)
I think we're also being distracted by the fact that, in part becausewe've chosen `instanceof` as our syntax, we want to use `instanceof`as our mental model for what matching means. This is a good guidingprinciple but we must be careful of following it blindly.
As a modeling simplification, let's assume that all patterns haveexactly one binding variable, and the type of that binding variable ispart of the pattern definition. We could model our match predicateand (conditional) binding function as:
    match :: (Pattern t) u -> Maybe t
A pattern represents the fusion of an applicability predicate, zero ormore conditional extractions, and a binding mechanism. For the simplecase of a type pattern `Foo f`, the applicability predicate is "areyou a Foo", and there are two possible interpretations -- "would`instanceof` say you are a `Foo`" (which means non-null), or "couldyou be assigned to a variable of type Foo" (or, equivalently, "are youin the value set of Foo".)
A pattern P is _total_ on U if `match P u` returns `Some t` for every`u : U`. Total patterns are useful because they allow the compiler toreason about control flow and provide better error checking (detectingdead code, silly pattern matches, totality of expression switches, etc.)
Let's go back to our trusty Box example. We can think of the `Box`constructor as a mapping:
    enBox :: t -> Box t

and the Box deconstructor as

    unBox :: Box t -> t
Now, what algebraic relationship do we want between enBox and unBox? The whole point is that a Box is a structure containing someproperties, and that patterns let us destructure Boxes to recoverthose properties. enBox and unBox should form a projection-embeddingpair, which means that enBox is allowed to be picky about what `t`values it accepts (think of the Rational constructor as throwing ondenom==0), but, once boxed, we should be able to recover whatever isin the box. (The Box code gets to mediate access in both directions,but the _language_ shouldn't make guesses about what this code isgoing to do.)
From the perspective of Box, is `null` a valid value of T? The answeris: "That's the Box author's business. The constructor accepts a T,and `null` is a valid member of T's value set. So if the imperativebody of the constructor doesn't do anything special to reject it, thenit's part of the domain." And if its part of the domain, then `unBox`should hand back what we handed to `enBox`. T in, T out.
It has been a driving goal throughout the pattern matching explorationto exploit these dualities, because (among other things) thisminimizes sharp edges and makes composition do what you expect it to. If I do:
    Box<T> b = new Box(t);
and this succeeds, then our `match` function applied to `Box(T)` and`b` should yield what we started with -- `t`. Singling out `null` forspecial treatment here as an illegal binding result is unwarranted; itcreates a sharp edge where you can put things into boxes but you canonly get them out on tuesdays. The language has no business tellingBox it can't contain nulls, or punishing null-happy boxes by makingthem harder to deconstruct. Null-hostility is for the Box author tochoose or not. I should be able to compose construction anddeconstruction without surprises.
Remember, we're not yet talking about language syntax here -- we'retalking about the semantics of matching (and what we let class authorsmodel). At this level, there is simply no other reasonable set ofsemantics here -- the `Box(T)` deconstructor, when applied to a validBox<T>, should be able to recover whatever was passed to the `newBox(T)` constructor. Nulls should be rejected by pattern matching atthe point where they would be derferenced, not preemptively.
There's also only one reasonable definition of the semantics of nestedmatching. If `P : Pattern t`, then the nested pattern P(Q) matches u iff
    u matches P(T alpha) && alpha matches Q
It follows that if `Box(Object o)` is going to to be total on allboxes, then Object o is total on all objects.
(There's also only one reasonable definition of the `var` pattern; itis type inference where we infer the type pattern for whatever type isthe target of the match. So if `P : Pattern T`, then `P(var x)`infers `T x` for the nested pattern.)
Doing anything else is an impediment to composition (and compositionis the only tool we have, as language designers, that separate us fromthe apes.) I can compose constructors:
    Box<Flox<Pox<T>>> b  = new Box(new Flox(new Pox(t)));

and I should be able to take this apart exactly the same way:

    if (b matches Box(Flox(Pox(var t)))
The reason `Flox(Pox p)` doesn't match null floxes is not becausepatterns shouldn't match null, but because a _deconstruction pattern_that takes apart a Flox is intrinsically going to look inside the Flox-- which means dereferencing it. But an ordinary type pattern is notnecessarily going to.
Looking at it from another angle, there is a natural interpretation ofapplying a total pattern as a generalization of assignment. It's notan accident that `T t` (or `var x`) looks both like a pattern and likea local variable declaration. We know that this:
    T t = e
or
    var t = e
is a local variable declaration with initializer, but we can alsoreasonably (and profitably) interpret it as a pattern match -- takethe (total on T) pattern `T t`, and match `e : T` to it. And thecompiler already knows that this is going to succeed if `e : T`. Togratuitously reject null here makes no sense. (Totality is importanthere; if the pattern were not total, then `t` would not be DA afterthe assignment, and therefore the declaration either has to throw aruntime error, or the compiler has to reject it.)
## Back to switch and instanceof
The above discussion argues why there is only one reasonable nullbehavior for patterns _in the abstract_. But, I hear you cry, thesemantics for switch and instanceof today are entirely reasonable andintuitive, so how could they be so wrong?
And the answer is: we have only been able to use `switch` and`instanceof` so far for pretty trivial things! When we add patternsto the language, we're raising the expressive ability of theseconstructs to some power. And extrapolating from our existingintuitions about these are like extrapolating the behavior ofpolynomials from their zeroth-order Taylor expansion.
(Now, that this point, the split-over-lump crowd says "Then you shoulddefine new constructs, if they're so much more powerful." But I stillclaim it is far better to refine our intuitions about what switchmeans, even with some discomfort, than to try to keep track of thesubtle differences between switch and snitch.)
So, why do we have the current null behavior for `instanceof` and`switch`? Well, right now, `instanceof` only lets you ask a very verysimple question -- "is the dynamic type of the target X". And, thedesigners judged (reasonable) that, since 99.999% of the time, whatyou're about to do is cast the target and then deference it, saying"no" is less error-prone than saying OK and then having the subsequentdereference fail.
But now, `instanceof` can answer far more sophisticated questions, andthat 99.999% becomes a complete unknown. With what confidence can yousay that the body of:
    if (b instanceof Box(var t)) { ... }
is going to dereference t? If you say more than 50%, you're lying. Itwould be totally reasonable to just take that t and assign itsomewhere else, rebox it into another box, pass it to some T-consumingmethod, etc. And who are we to say that Box-consuming protocols aresomehow "bad" if they like to truck in null contents? That's not ourbusiness! So the conditions under which "always says no" wasreasonable for Java 1.0 are no longer applicable.
The same is true for switch, because of the very limited referencetypes which switch permits (and which were only added in Java 5) --boxed primitives, strings, and enums. In all of these cases, we areasking very simple questions ("are you 3"), and these are domainswhere nulls have historically been denigrated -- so it seemedreasonable for switch to be hostile to them. But once we introducepatterns, the set of questions you can ask gets enormously larger, andthe set of types you can switch over does too. The old conditionsdon't apply. In:
    switch (o) {
        case Box(var t): ...
        case Bag(var t): ...
    }
we care about the contents, not the wrapping; the switch is there todo the unwrapping for us. Who are we to say "sorry, no one shouldever be allowed to put a null in a Bag?" That's not our business!
At this point, I suspect Remi says "I'm not saying you can't put anull in a Box, but there should be a different way to unpack it." Butunless you can say with 99.99% certainty that nulls are always errors,it is better to be agnostic to nulls in the plumbing and let usersfilter them at the ultimate point of consumption, than to make theplumbing null-hostile and make users jump through hoops to get thenulls to flow. The same was true for streams; we made the (absolutelycorrect) choice to let the nulls flow through the stream, and, if youare using a maybe-null-containing source, and doing null-incompatiblethings on the elements, it's on you to filter them. It is easier tofilter nulls than to to add back a special encoding for nulls. (And,the result of that experiment was pretty conclusive: of the hundredsof stack overflow questions I have seen on streams, not one centeredaround unexpected nulls.)
If we have guards, and you want to express "no Boxes with nulls",that's easy:
    case Box(var t) when t != null: ...
And again, as with `instanceof`, we have no reason to believe thatthere's a 99.99% chance that the next thing the user is going to do isdereference it. So the justification that null-hostility is the"obvious" semantics here doesn't translate to the new, more powerfullanguage feature.
And it gets worse: the people who really want the nulls now have to doadditional error-prone work, either use some ad-hoc epicyclical syntaxat each use site (and, if the deconstruction pattern has fivebindings, you have to say it five times), or having to duplicateblocks of code to avoid the switch anomaly.
The conclusion of this section is that while the existing nullbehavior for instanceof and switch is justified relative to their_current_ limitations, once we remove those limitations, thosebehaviors are much more arbitrary (and kind of mean: "nulls are sobad, that if you are a null-using person, we will make it harder foryou, 'for your own good'.")
#### Split the baby?
Now, there is room to make a reasonable argument that we'd rather keepthe existing switch behavior, but accept the null-friendly matchingbehavior. My take is that this is a bad trade, but let's look at itmore carefully.
Gain: I don't have to learn a new set of rules about whatswitch/instanceof do with null.
Loss: code duplication. If I want my fallback to handle nulls, I haveto duplicate code; instead of
    switch (o) {
        case String s: A
        case Long l: B
        case Object o: C
    }

I have to do

    if (o == null) { C }
    else switch (o) {
        case String s: A
        case Long l: B
        case Object o: C
    }
resulting in duplicating C. (We have this problem today, but becauseof the limitations of switch today, it is rarely a problem. When ourcase labels are more powerful, we'll be using switch for more stuff,and it will surely come up more often.)
Loss: refactoring anomaly.  Refactoring a nested switch with:

     case P(Q):
     case P(R):
     case P(S):

to

    case P(var x):
        switch (x) {
            case Q: ...
            case R: ...
            case S: ...
        }
    }
doesn't work in the obvious way. Yes, there's a way to refactor it,and the IDE will do it correctly. But it becomes a sharp edge thatusers will trip over. The reason the above refactoring is desirableis because users will reasonably assume it works, and rather than cutthem with a sharp edge, we can just make it way they way theyreasonable think it should.
So, we could make this trade, and it would be more "minimal" -- but Ithink it would result in a less useful switch in the long run. Ithink we would regret it.
#### Conclusion
If we were designing pattern matching and switch together fromscratch, we would never even consider the current nullity behavior;the "wait until someone actually dereferences before we throw" is theobvious and only reasonable choice. We're being biased based on ourexisting assumptions about instanceof and switch. This is areasonable starting point, but we have to admit that these biases inturn come from the fact that the current interpretations of thoseconstructs are dramatically limited compared to supporting patterns.
It is easy to trot out anecdotes where any of the possible schemeswould cause a particular user to be confused. But this is just a wayto justify our biases. The reality is that, as switch and instanceofget more powerful, we don't get to make as many assumptions about theliklihood of whether `null` is an error or not. And, the more likelyit is not an error, the less justification we have for giving itspecial semantics.
Let the nulls flow.

Re: Next up for patterns: type patterns in switch

Reply via email to