Below are some more refined thoughts about patterns, generics, null and
primitives. Thank you for your feedback so far. Comments welcome!
Gavin
----
Patterns: Types, primitives and null
====================================
In this note, we address some of the issues raised in earlier posts concerning
pattern matching.
To make matters concrete, let us define a simple grammar of patterns:
p ::= l //literal (primitive type constant expression)
null //null pattern
R x //Reference type test pattern
V x //Value type test pattern
D(p_1, ..., p_n) //Deconstructor pattern (D is a *data class*)
var x //Type inference
For simplicity, we have only considered primitive type constant expression
literals, but we can easily extend what follows to cover string constant
expressions and Enum names. For generality we have included simple
deconstruction patterns, which will most likely appear first with data classes
- a feature we have proposed elsewhere.
Type checking basics
====================
To type check the expression e matches R x where R is a reference type, we
check that the type of the expression e is cast convertible to R. This is
captured by the following typing rule:
e : T T cast-convertible R
------------------------------
e matches R x : bool
For value types, we propose a simpler rule. When type checking an expression e
matches V x where V is a value type, we shall simply check that the type of e
is either the value type V or the boxed type of V. Again we can write this more
formally as follows.
e : V or e : box(V)
-----------------------
e matches V x : bool
(We have assumed a function, box, that returns the boxed reference type of a
given value type, e.g. box(int)=Integer.)
Type checking literal patterns is similar. Thus given an expression e matches l
where l is a primitive type constant expression of type V, we check that e has
exactly the type V. (We could also permit e to have the boxed type of V, but we
disallow this for simplicity.)
e : V l : V
------------------
e matches l : bool
[A small corner case with this rule is where the primitive type constant
expression l is a numeric literal without a suffix, e.g. 1. In this case we
would treat l as a poly expression and use the type of the expression e as a
target type.]
Given an expression e matches null we check that e has some reference type R.
e : R
---------------------
e matches null : bool
Generics
========
Currently instanceof is restricted so that the type must be reifiable. In other
words the following is not allowed.
ArrayList<Integer> al = ...
if (al instanceof List<Integer>) { // Error
...
}
We propose to lift this restriction for pattern matching using type test
patterns. In other words, a reference type test pattern match is well typed if
the cast conversion is not unchecked. If the cast conversion is unchecked then
type checking fails.
In what follows we will assume that all pattern matching code has type checked
correctly.
Nulls
=====
To recall, the issue with nulls and pattern matching came from Java’s current
slightly inconsistent treatment of null.
Currently x instanceof Foo returns false where x is the null value. Clearly, we
would want x matches Foo f to do the same.
However, when we come to support deconstruction patterns, we would expect
Foo(null) to match a pattern Foo(var x). Moreover, we intend the pattern var x
to mean the same as Bar x where the type has been inferred to be Bar. So we
appear to have conflicting requirements!
A further complication is that the switch statement currently throws an
exception when the switch expression is the null value.
To recap:
Foo f = null; //Foo has a String field
f matches Foo x //would like to return false as for instanceof
switch (f) { case Foo: ... } //throws
f = new Foo(null);
f matches Foo(String x) //would like to return true and bind x to null
f matches Foo(var x) //behave identically to above
We need to reconcile the current semantics and find a generalisation that
matches (sic) the above behaviour for patterns.
First, we shall generalise the switch statement in the following way. We
propose that it no longer throws an exception when the switch expression is the
null value. However, we propose that all switches have an implicit case null:
throw new NullPointerException(); clause, unless there is an explicit case null
pattern label in the switch body. In this way, all existing switch statements
will continue to mean the same thing, but users of the enhanced switch can now
program a case to handle the null value case.
We had previously proposed dealing with null by considering the type of the
pattern test to determine if it was a type-restatement pattern or not. Whilst
this works, it could cause some subtle action-at-a-distance effects that we
found discomforting. Another possibility had been to classify patterns as
top-level and non-top-level and assign them different semantics. This was felt
to be too subtle.
We think we now have a solution that avoids this partition of patterns, leading
to a simpler and clearer system. The key to this solution is to separate the
semantics of a pattern from the semantics of the construct that uses the
pattern.
tl;dr: The semantics of pattern matching is defined to match null values
against type test patterns. The semantics of constructs that use pattern
matching is layered over this definition in order to capture the existing
behaviours with respect to null values at the top level.
Given a pattern p, and a value e of type T, we can define the semantics of this
pattern with a function, match, that returns a boolean indicating whether the
pattern matches the value or not. (Recall that we assume that type checking has
already succeeded.)
match(l, e:V) =def e = l
match(l, e:R) =def e <> null /\ unbox(e) = l
match(null, e:V) =def false // n/a by type-checking
match(null, e:R) =def e = null
match(R t, e:V) =def false // n/a by type-checking
match(R t, e:R') =def e = null \/ e instanceof R
match(V t, e:V') =def true // by type-checking, V = V'
match(V t, e:R) =def e <> null // by type-checking, R = box(V)
match(D(pa_1, ..., pa_n), e:V) =def false
match(D(pa_1, ..., pa_n), e:R) =def
e instanceof D /\ match(pa_1, e.x_1:T_1) /\
... /\ match(pa_n, e.x_n:T_n)
where data class D(T_1 x_1, ..., T_n x_n);
This definition should be intuitive, but let us consider the first two cases
that deal with matching a value e against a literal pattern l. If e has a value
type then the pattern match is just a simple check for equality between the
value and the literal. If e has a reference type then we first check if it has
the null value. If so we return false, otherwise we unbox the value and
recursively check this against the pattern.
The important case is where we have a type test pattern for a reference type R
and a value of a reference type. In this case we check if the value has the
null value. If so we return true, otherwise we check whether the value is an
instance of the type R. In other words, the inherent semantics of pattern
matching returns true when matching a null value against a reference type test
R x.
Note there is no case for the var pattern as the compiler will have replaced it
with an explicit type test pattern as a result of type inference.
The final step is to use this to define the semantics of the two constructs
that use pattern matching, i.e. matches and switch. This can be summarised
using the following table (where e has static type T).
| e = null | e <> null
-------------------------------------------------
| |
e matches P | false | match(P, e:T)
| |
| |
switch (e) { | |
case null: s | s |
} | |
| |
| |
switch (e) { | |
// P not null | NPE | s if match(P,e:T)
case P : s | |
} | |
| |
Below is a number of examples demonstrating these semantics in action.
Object o = null;
o matches String s // false
o matches Box(var x) // false
new Box(o) matches Box b // true
new Box(o) matches Box(null) // true
new Box(o) matches Box(Object o) // true
new Box(o) matches Box(var o) // true (o inferred to have
// type Object)
switch (o) { case String s: } // NPE
switch (o) {
case null: System.out.println("hello");
} // Prints hello
o=(Object)"Hello";
switch (o) {
case String s: System.out.println(s);
} // Prints Hello
switch (o) {
case null: break;
case String s: System.out.println(s);
} // Prints Hello
o=null;
switch (new Box(o)) {
case Box b: System.out.println("Box");
} // Prints Box
switch (new Box(o)) {
case Box(null):
System.out.println("A");
break;
case Box(var x):
System.out.println("B");
} // Prints A
switch (new Box(o)) {
case Box(var x):
System.out.println(x==null);
} // Prints true