Below are some more refined thoughts about patterns, generics, null and 
primitives. Thank you for your feedback so far. Comments welcome!

Gavin
----

Patterns: Types, primitives and null
====================================

In this note, we address some of the issues raised in earlier posts concerning 
pattern matching. 

To make matters concrete, let us define a simple grammar of patterns:

p  ::= l                    //literal (primitive type constant expression)
       null                 //null pattern
       R x                  //Reference type test pattern
       V x                  //Value type test pattern
       D(p_1, ..., p_n)     //Deconstructor pattern (D is a *data class*)
       var x                //Type inference

For simplicity, we have only considered primitive type constant expression 
literals, but we can easily extend what follows to cover string constant 
expressions and Enum names. For generality we have included simple 
deconstruction patterns, which will most likely appear first with data classes 
- a feature we have proposed elsewhere.


Type checking basics
====================

To type check the expression e matches R x where R is a reference type, we 
check that the type of the expression e is cast convertible to R. This is 
captured by the following typing rule:

e : T     T cast-convertible R
------------------------------
e matches R x : bool

For value types, we propose a simpler rule. When type checking an expression e 
matches V x where V is a value type, we shall simply check that the type of e 
is either the value type V or the boxed type of V. Again we can write this more 
formally as follows.

e : V   or   e : box(V)
-----------------------
e matches V x : bool

(We have assumed a function, box, that returns the boxed reference type of a 
given value type, e.g. box(int)=Integer.)

Type checking literal patterns is similar. Thus given an expression e matches l 
where l is a primitive type constant expression of type V, we check that e has 
exactly the type V. (We could also permit e to have the boxed type of V, but we 
disallow this for simplicity.)

e : V    l : V
------------------
e matches l : bool

[A small corner case with this rule is where the primitive type constant 
expression l is a numeric literal without a suffix, e.g. 1. In this case we 
would treat l as a poly expression and use the type of the expression e as a 
target type.]

Given an expression e matches null we check that e has some reference type R.

e : R
---------------------
e matches null : bool


Generics
========

Currently instanceof is restricted so that the type must be reifiable. In other 
words the following is not allowed.

ArrayList<Integer> al = ...
if (al instanceof List<Integer>) {          // Error
    ...
}

We propose to lift this restriction for pattern matching using type test 
patterns. In other words, a reference type test pattern match is well typed if 
the cast conversion is not unchecked. If the cast conversion is unchecked then 
type checking fails.

In what follows we will assume that all pattern matching code has type checked 
correctly.


Nulls
=====

To recall, the issue with nulls and pattern matching came from Java’s current 
slightly inconsistent treatment of null.

Currently x instanceof Foo returns false where x is the null value. Clearly, we 
would want x matches Foo f to do the same.

However, when we come to support deconstruction patterns, we would expect 
Foo(null) to match a pattern Foo(var x). Moreover, we intend the pattern var x 
to mean the same as Bar x where the type has been inferred to be Bar. So we 
appear to have conflicting requirements!

A further complication is that the switch statement currently throws an 
exception when the switch expression is the null value.

To recap:

Foo f = null;                //Foo has a String field
f matches Foo x              //would like to return false as for instanceof
switch (f) { case Foo: ... } //throws
f = new Foo(null);
f matches Foo(String x)      //would like to return true and bind x to null
f matches Foo(var x)         //behave identically to above

We need to reconcile the current semantics and find a generalisation that 
matches (sic) the above behaviour for patterns.

First, we shall generalise the switch statement in the following way. We 
propose that it no longer throws an exception when the switch expression is the 
null value. However, we propose that all switches have an implicit case null: 
throw new NullPointerException(); clause, unless there is an explicit case null 
pattern label in the switch body. In this way, all existing switch statements 
will continue to mean the same thing, but users of the enhanced switch can now 
program a case to handle the null value case.

We had previously proposed dealing with null by considering the type of the 
pattern test to determine if it was a type-restatement pattern or not. Whilst 
this works, it could cause some subtle action-at-a-distance effects that we 
found discomforting. Another possibility had been to classify patterns as 
top-level and non-top-level and assign them different semantics. This was felt 
to be too subtle.

We think we now have a solution that avoids this partition of patterns, leading 
to a simpler and clearer system. The key to this solution is to separate the 
semantics of a pattern from the semantics of the construct that uses the 
pattern.

tl;dr: The semantics of pattern matching is defined to match null values 
against type test patterns. The semantics of constructs that use pattern  
matching is layered over this definition in order to capture the existing 
behaviours with respect to null values at the top level.

Given a pattern p, and a value e of type T, we can define the semantics of this 
pattern with a function, match, that returns a boolean indicating whether the 
pattern matches the value or not. (Recall that we assume that type checking has 
already succeeded.)

match(l, e:V)      =def e = l
match(l, e:R)      =def e <> null /\ unbox(e) = l

match(null, e:V)   =def false               // n/a by type-checking
match(null, e:R)   =def e = null

match(R t, e:V)    =def false               // n/a by type-checking
match(R t, e:R')   =def e = null \/ e instanceof R

match(V t, e:V')   =def true                // by type-checking, V = V'
match(V t, e:R)    =def e <> null           // by type-checking, R = box(V)

match(D(pa_1, ..., pa_n), e:V) =def false
match(D(pa_1, ..., pa_n), e:R) =def 
                        e instanceof D /\ match(pa_1, e.x_1:T_1) /\ 
                                   ... /\ match(pa_n, e.x_n:T_n)
                                where  data class D(T_1 x_1, ..., T_n x_n);

This definition should be intuitive, but let us consider the first two cases 
that deal with matching a value e against a literal pattern l. If e has a value 
type then the pattern match is just a simple check for equality between the 
value and the literal. If e has a reference type then we first check if it has 
the null value. If so we return false, otherwise we unbox the value and 
recursively check this against the pattern.

The important case is where we have a type test pattern for a reference type R 
and a value of a reference type. In this case we check if the value has the 
null value. If so we return true, otherwise we check whether the value is an 
instance of the type R. In other words, the inherent semantics of pattern 
matching returns true when matching a null value against a reference type test 
R x.

Note there is no case for the var pattern as the compiler will have replaced it 
with an explicit type test pattern as a result of type inference.

The final step is to use this to define the semantics of the two constructs 
that use pattern matching, i.e. matches and switch. This can be summarised 
using the following table (where e has static type T).

                | e = null      |  e <> null
-------------------------------------------------
                |               | 
e matches P     | false         |  match(P, e:T)
                |               |
                |               |
switch (e) {    |               |
  case null: s  | s             |             
}               |               |
                |               |
                |               |
switch (e) {    |               |
  // P not null | NPE           | s    if match(P,e:T)
  case P : s    |               |      
}               |               | 
                |               |

Below is a number of examples demonstrating these semantics in action.

Object o = null;
o matches String s                          // false
o matches Box(var x)                        // false
new Box(o) matches Box b                    // true
new Box(o) matches Box(null)                // true
new Box(o) matches Box(Object o)            // true
new Box(o) matches Box(var o)               // true (o inferred to have
                                            //       type Object)

switch (o) { case String s: }               // NPE

switch (o) { 
    case null: System.out.println("hello");
}                                           // Prints hello

o=(Object)"Hello";

switch (o) { 
    case String s: System.out.println(s);
}                                           // Prints Hello

switch (o) { 
    case null: break; 
    case String s: System.out.println(s);
}                                           // Prints Hello

o=null;
switch (new Box(o)) { 
    case Box b: System.out.println("Box");
}                                           // Prints Box

switch (new Box(o)) { 
    case Box(null): 
        System.out.println("A");
        break; 
    case Box(var x): 
        System.out.println("B");
}                                           // Prints A

switch (new Box(o)) { 
    case Box(var x): 
        System.out.println(x==null);
}                                           // Prints true

Reply via email to