Sigh, floating point. Yes, this is the most difficult corner of this work.
Bear in mind that you're really asking questions about cast conversion.
Additionally, we have to define which of these casts are "lossy" vs
which are "information preserving" (exact), which for most numbers is
straightforward but for weird floating point values might be harder.
So let's first see what happens when we cast your examples:
jshell> (int) 2.0f
$1 ==> 2
jshell> (int) 2.5f
$2 ==> 2
jshell> (float) Double.MAX_VALUE
Infinity
jshell> (float) Math.PI
$4 ==> 3.1415927
jshell> (float) Double.NaN
$5 ==> NaN
jshell> (double) Double.NaN
$6 ==> NaN
Clearly #2 (casting 2.5 to int) is lossy, and therefore would not be
exact, so we can cross that one off the list. Similar for
Double.MAX_VALUE to float. It's also pretty clear that Math.PI -- which
is merely an alias for 3.14159265358979323846 -- is also not
representable in the range of float.
So that leaves:
2.0 instanceof int
Double.NaN instanceof float
Double.NaN instanceof double
The first one is an exact cast; this can be specified in a number of
ways, including "sufficient number of low order bits are zero", or
"representable in the range of the target type", or others. The binary
representation of 2.0f is 01000000000000000000000000000000, which
exactly encodes an integer. Again, this is not a rule about
`instanceof`; this is derived from casting (though we do have to define
what exactness means.)
For NaN, negative zero, and infinity, whether various conversions are
exact or not may require a somewhat ad-hoc decision, but I think the
intuitive answers for NaN is that Float.NaN <--> Double.NaN is exact,
and similar for Float.Inf <--> Double.Inf. There will be an argument
about -0.0 and 0.0.
Intuitively, I think I would expect that if t is an expression of
primitive type T, and U is also a primitive type, then t instanceof
Uis true iff t == (T) (U) t. (Perhaps with a variant of == that is
reflexive even for NaN, or perhaps not.) That would make (1) true,
(2,3,4) false, and (5,6) who knows.
This is a good intuition, and is true for some types, and almost true
for others, but it falls into some holes. In particular, with some
conversions, `(U) t` is lossy but `(T) (U) t` is also lossy in an
exactly compensating way.
This makes a lot of sense. I'm wondering, though, how it works with
float and double. I don't see the answer by looking at JLS Ch5. Are
the following expressions legal, and if so which ones are true?
1. 2.0 instanceof int
2. 2.5 instanceof int
3. Double.MAX_VALUE instanceof float
4. Math.PI instanceof float
5. Double.NaN instanceof float
6. Double.NaN instanceof double
Intuitively, I think I would expect that if t is an expression of
primitive type T, and U is also a primitive type, then t instanceof
Uis true iff t == (T) (U) t. (Perhaps with a variant of == that is
reflexive even for NaN, or perhaps not.) That would make (1) true,
(2,3,4) false, and (5,6) who knows.
On Thu, 8 Sept 2022 at 09:53, Brian Goetz <[email protected]> wrote:
Earlier in the year we talked about primitive type patterns. Let
me summarize
the past discussion, what I think the right direction is, and why
this is (yet
another) "finishing up the job" task for basic patterns that, if
left undone,
will be a sharp edge.
Prior to record patterns, we didn't support primitive type
patterns at all. With
records, we now support primitive type patterns as nested
patterns, but they are
very limited; they are only applicable to exactly their own type.
The motivation for "finishing" primitive type patterns is the same
as discussed
earlier this week with array patterns -- if pattern matching is
the dual of
aggregation, we want to avoid gratuitous asymmetries that let you
put things
together but not take them apart.
Currently, we can assign a `String` to an `Object`, and recover
the `String`
with a pattern match:
Object o = "Bob";
if (o instanceof String s) { println("Hi Bob"); }
Analogously, we can assign an `int` to a `long`:
long n = 0;
but we cannot yet recover the int with a pattern match:
if (n instanceof int i) { ... } // error, pattern `int i` not
applicable to `long`
To fill out some more of the asymmetries around records if we
don't finish the job: given
record R(int i) { }
we can construct it with
new R(anInt) // no adaptation
new R(aShort) // widening
new R(anInteger) // unboxing
but yet cannot deconstruct it the same way:
case R(int i) // OK
case R(short s) // nope
case R(Integer i) // nope
It would be a gratuitous asymmetry that we can use pattern
matching to recover from
reference widening, but not from primitive widening. While many of the
arguments against doing primitive type patterns now were of the
form "let's keep
things simple", I believe that the simpler solution is actually to
_finish the
job_, because this minimizes asymmetries and potholes that users
would otherwise
have to maintain a mental catalog of.
Our earlier explorations started (incorrectly, as it turned out), with
assignment context. This direction gave us a good push in the
right direction,
but turned out to not be the right answer. A more careful reading
of JLS Ch5
convinced me that the answer lies not in assignment conversion,
but _cast
conversion_.
#### Stepping back: instanceof
The right place to start is actually not patterns, but
`instanceof`. If we
start here, and listen carefully to the specification, it leads us
to the
correct answer.
Today, `instanceof` works only for reference types. Accordingly,
most people
view `instanceof` as "the subtyping operator" -- because that's
the only
question we can currently ask it. We almost never see
`instanceof` on its own;
it is nearly always followed by a cast to the same type.
Similarly, we rarely
see a cast on its own; it is nearly always preceded by an
`instanceof` for the
same type.
There's a reason these two operations travel together: casting is,
in general,
unsafe; we can try to cast an `Object` reference to a `String`,
but if the
reference refers to another type, the cast will fail. So to make
casting safe,
we precede it with an `instanceof` test. The semantics of
`instanceof` and
casting align such that `instanceof` is the precondition test for
safe casting.
> instanceof is the precondition for safe casting
Asking `instanceof T` means "if I cast this to T, would I like the
answer."
Obviously CCE is an unlikable answer; `instanceof` further adopts
the opinion
that casting `null` would also be an unlikable answer, because
while the cast
would succeed, you can't do anything useful with the result.
Currently, `instanceof` is only defined on reference types, and on
this domain
coincides with subtyping. On the other hand, casting is defined
between
primitive types (widening, narrowing), and between primitive and
reference types
(boxing, unboxing). Some casts involving primitives yield
"better" results than
others; casting `0` to `byte` results in no loss of information,
since `0` is
representable as a byte, but casting `500` to `byte` succeeds but
loses
information because the higher order bits are discarded.
If we characterize some casts as "lossy" and others as "exact" --
where lossy
means discarding useful information -- we can extend the "safe casting
precondition" meaning of `instanceof` to primitive operands and
types in the
obvious way -- "would casting this expression to this type succeed
without error
and without information loss." If the type of the expression is
not castable to
the type we are asking about, we know the cast cannot succeed and
reject the
`instanceof` test at compile time.
Defining which casts are lossy and which are exact is fairly
straightforward; we
can appeal to the concept already in the JLS of "representable in
the range of a
type." For some pairs of types, casting is always exact (e.g.,
casting `int` to
`long` is always exact); we call these "unconditionally exact".
For other pairs
of types, some values can be cast exactly and others cannot.
Defining which casts are exact gives us a simple and precise
semantics for `x
instanceof T`: whether `x` can be cast exactly to `T`. Similarly,
if the static
type of `x` is not castable to `T`, then the corresponding
`instanceof` question
is rejected statically. The answers are not suprising:
- Boxing is always exact;
- Unboxing is exact for all non-null values;
- Reference widening is always exact;
- Reference narrowing is exact if the type of the target
expression is a
subtype of the target type;
- Primitive widening and narrowing are exact if the target
expression can be
represented in the range of the target type.
#### Primitive type patterns
It is a short hop from `instanceof` to patterns (including
primitive type
patterns, and reference type patterns applied to primitive types),
which can be
defined entirely in terms of cast conversion and exactness:
- A type pattern `T t` is applicable to a target of type `S` if
`S` is
cast-convertible to `T`;
- A type pattern `T t` matches a target `x` if `x` can be cast
exactly to `T`;
- A type pattern `T t` is unconditional at type `S` if casting
from `T` to `S`
is unconditionally exact;
- A type pattern `T t` dominates a type pattern `S s` (or a
record pattern
`S(...)`) if `T t` would be unconditional on `S`.
While the rules for casting are complex, primitive patterns add no new
complexity; there are no new conversions or conversion contexts.
If we see:
switch (a) {
case T t: ...
}
we know the case matches if `a` can be cast exactly to `T`, and
the pattern is
unconditional if _all_ values of `a`'s type can be cast exactly to
`T`. Note
that none of this is specific to primitives; we derive the
semantics of _all_
type patterns from the enhanced definition of casting.
Now, our record deconstruction examples work symmetrically to
construction:
case R(int i) // OK
case R(short s) // test if `i` is in the range of `short`
case R(Integer i) // box `i` to `Integer`