On 3/2/2022 1:43 PM, Dan Heidinga wrote:

Making the pattern match compatible with assignment conversions makes
sense to me and follows a similar rationale to that used with
MethodHandle::asType following the JLS 5.3 invocation conversions.
Though with MHs we had the ability to add additional conversions under
MethodHandles::explicitCastArguments. With pattern matching, we don't
have the same ability to make the "extra" behaviour opt-in / opt-out.
We just get one chance to pick the right behaviour.

Indeed.  And the thing that I am trying to avoid here is creating _yet another_ new context in which a different bag of ad-hoc conversions are possible.  While it might be justifiable from a local perspective to say "its OK if `int x` does unboxing, but having it do range checking seems new and different, so let's not do that", from a global perspective, that means we a new context ("pattern match context") to add to assignment, loose invocation, strict invocation, cast, and numeric contexts.  That is the kind of incremental complexity I'd like to avoid, if there is a unifying move we can pull.

Conversions like unboxing or casting are burdened by the fact that they have to be total, which means the "does it fit" / "if so, do it" / "if not, do something else (truncate, throw, etc)" all have to be crammed into a single operation.  What pattern matching is extracts the "does it fit, and if so do it" into a more primitive operation, from which other operations can be composed.

At some level, what I'm proposing is all spec-shuffling; we'll either say "a widening primitive conversion is allowed in assignment context", or we'll say that primitive `P p` matches any primitive type Q that can be widened to P.  We'll end up with a similar number of rules, but we might be able to "shake the box" to make them settle to a lower energy state, and be able to define (whether we explicitly do so or not) assignment context to support "all the cases where the LHS, viewed as a type pattern, are exhaustive on the RHS, potentially with remainder, and throws if remainder is encountered."  (That's what unboxing does; throws when remainder is encountered.)

As to the range check, it has always bugged me that you see code that looks like:

    if (i >= -127 && i <= 128) { byte b = (byte) i; ... }

because of the accidental specificity, and the attendant risk of error (using <= instead of <, or using 127 instead of 128). Being able to say:

    if (i instanceof byte b) { ... }

is better not because it is more compact, but because you're actually asking the right question -- "does this int value fit in a byte."  I'm sad we don't really have a way to ask this question today; it seems an omission.

Intuitively, the behaviour you propose is kind of what we want - all
the possible byte cases end up in the byte case and we don't need to
adapt the long case to handle those that would have fit in a byte.
I'm slightly concerned that this changes Java's historical approach
and may lead to surprises when refactoring existing code that treats
unbox(Long) one way and unbox(Short) another.  Will users be confused
when the unbox(Long) in the short right range ends up in a case that
was only intended for unbox(Short)?  I'm having a hard time finding an
example that would trip on this but my lack of imagination isn't
definitive =)

I'm worried about this too.  We examined it briefly, and ran away, when we were thinking about constant patterns, specifically:

    Object o = ...
    switch (o) {
        case 0: ...
        default: ...
    }

What would this mean?  What I wouldn't want it to mean is "match Long 0, Integer 0, Short 0, Byte 0, Character 0"; that feels like it is over the line for "magic".  (Note that this is about defining what the _constant pattern_ means, not the primitive type pattern.) I think its probably reasonable to say this is a type error; 0 is applicable to primitive numerics and their boxes, but not to Number or Object.  I think that is consistent with what I'm suggesting about primitive type patterns, but I'd have to think about it more.

Something like following shouldn't be surprising given the existing
rules around unbox + widening primitive conversion (though it may be
when first encountered as I expect most users haven't really
internalized the JLS 5.2 rules):

As Alex said to me yesterday: "JLS Ch 5 contains many more words than any prospective reader would expect to find on the subject, but once the reader gets over the overwhelm of how much there is to say, will find none of the words surprising."  There's a deeper truth to this statement: Java is not actually as simple a language as its mythology suggests, but we win by hiding the complexity in places users generally don't have to look, and if and when they do confront the complexity, they find it unsurprising, and go back to ignoring it.

So in point of fact, *almost no one* has read JLS 5.2, but it still does "what users would likely find reasonable".

Number n = ....;
switch(n) {
   case long l -> ...
   case int i -> .... // dead code
   case byte b -> .... // dead code
   default -> ....
}

Correct.  We have rules for pattern dominance, which are used to give compile errors on dead cases; we'd have to work through the details to confirm that `long l` dominates `int i`, but I'd hope this is the case.

But this may be more surprising as I suggested above

Number n = new Long(5);
switch(n) {
   case byte b -> .... // matches here
   case int i -> .... //
   case long l -> ...
   default -> ....
}

Overall, I like the extra dynamic range check but would be fine with
leaving it out if it complicates the spec given it feels like a pretty
deep-in-the-weeds corner case.

It is probably not a forced move to support the richer interpretation of primitive patterns now.  But I think the consequence of doing so may be surprising: rather than "simplifying the language" (as one might hope that "leaving something out" would do), I think there's a risk that it makes things more complicated, because (a) it effectively creates yet another conversion context that is distinct from the too-many we have now, and (b) creates a sharp edge where refactoring from local variable initialization to let-bind doesn't work, because assignment would then be looser than let-bind.

One reason this is especially undesirable is that one of the forms of let-bind is a let-bind *expression*:

    let P = p, Q = q
    in <expression>

which is useful for pulling out subexpressions and binding them to a variable, but for which the scope of that variable is limited.  If refactoring from:

    int x = stuff;
    m(f(stuff));

to

    m(let x = stuff in f(stuff))
    // x no longer in scope here

was not possible because of a silly mismatch between the conversions in let context and the conversions in assignment context, then we're putting users in the position of having to choose between richer conversions and richer scoping.

(Usual warning (Remi): I'm mentioning let-expressions because it gives a sense of where some of these constraints come from, but this is not a suitable time to design the let-expression feature.)


Reply via email to