Re: Primitives in instanceof and patterns

Brian Goetz Thu, 08 Sep 2022 13:04:03 -0700

Sigh, floating point.  Yes, this is the most difficult corner of this work.

Bear in mind that you're really asking questions about cast conversion. Additionally, we have to define which of these casts are "lossy" vswhich are "information preserving" (exact), which for most numbers isstraightforward but for weird floating point values might be harder. So let's first see what happens when we cast your examples:


jshell> (int) 2.0f
$1 ==> 2

jshell> (int) 2.5f
$2 ==> 2

jshell> (float) Double.MAX_VALUE
Infinity

jshell> (float) Math.PI
$4 ==> 3.1415927

jshell> (float) Double.NaN
$5 ==> NaN

jshell> (double) Double.NaN
$6 ==> NaN

Clearly #2 (casting 2.5 to int) is lossy, and therefore would not beexact, so we can cross that one off the list. Similar forDouble.MAX_VALUE to float. It's also pretty clear that Math.PI -- whichis merely an alias for 3.14159265358979323846 -- is also notrepresentable in the range of float.


So that leaves:

    2.0 instanceof int
    Double.NaN instanceof float
    Double.NaN instanceof double

The first one is an exact cast; this can be specified in a number ofways, including "sufficient number of low order bits are zero", or"representable in the range of the target type", or others. The binaryrepresentation of 2.0f is 01000000000000000000000000000000, whichexactly encodes an integer. Again, this is not a rule about`instanceof`; this is derived from casting (though we do have to definewhat exactness means.)

For NaN, negative zero, and infinity, whether various conversions areexact or not may require a somewhat ad-hoc decision, but I think theintuitive answers for NaN is that Float.NaN <--> Double.NaN is exact,and similar for Float.Inf <--> Double.Inf. There will be an argumentabout -0.0 and 0.0.

Intuitively, I think I would expect that if t is an expression ofprimitive type T, and U is also a primitive type, then t instanceofUis true iff t == (T) (U) t. (Perhaps with a variant of == that isreflexive even for NaN, or perhaps not.) That would make (1) true,(2,3,4) false, and (5,6) who knows.

This is a good intuition, and is true for some types, and almost truefor others, but it falls into some holes. In particular, with someconversions, `(U) t` is lossy but `(T) (U) t` is also lossy in anexactly compensating way.

This makes a lot of sense. I'm wondering, though, how it works withfloat and double. I don't see the answer by looking at JLS Ch5. Arethe following expressions legal, and if so which ones are true?


 1. 2.0 instanceof int
 2. 2.5 instanceof int
 3. Double.MAX_VALUE instanceof float
 4. Math.PI instanceof float
 5. Double.NaN instanceof float
 6. Double.NaN instanceof double

Intuitively, I think I would expect that if t is an expression ofprimitive type T, and U is also a primitive type, then t instanceofUis true iff t == (T) (U) t. (Perhaps with a variant of == that isreflexive even for NaN, or perhaps not.) That would make (1) true,(2,3,4) false, and (5,6) who knows.


On Thu, 8 Sept 2022 at 09:53, Brian Goetz <[email protected]> wrote:

    Earlier in the year we talked about primitive type patterns.  Let
    me summarize
    the past discussion, what I think the right direction is, and why
    this is (yet
    another) "finishing up the job" task for basic patterns that, if
    left undone,
    will be a sharp edge.

    Prior to record patterns, we didn't support primitive type
    patterns at all. With
    records, we now support primitive type patterns as nested
    patterns, but they are
    very limited; they are only applicable to exactly their own type.

    The motivation for "finishing" primitive type patterns is the same
    as discussed
    earlier this week with array patterns -- if pattern matching is
    the dual of
    aggregation, we want to avoid gratuitous asymmetries that let you
    put things
    together but not take them apart.

    Currently, we can assign a `String` to an `Object`, and recover
    the `String`
    with a pattern match:

        Object o = "Bob";
        if (o instanceof String s) { println("Hi Bob"); }

    Analogously, we can assign an `int` to a `long`:

        long n = 0;

    but we cannot yet recover the int with a pattern match:

        if (n instanceof int i) { ... } // error, pattern `int i` not
    applicable to `long`

    To fill out some more of the asymmetries around records if we
    don't finish the job: given

        record R(int i) { }

    we can construct it with

        new R(anInt)     // no adaptation
        new R(aShort)    // widening
        new R(anInteger) // unboxing

    but yet cannot deconstruct it the same way:

        case R(int i)     // OK
        case R(short s)   // nope
        case R(Integer i) // nope

    It would be a gratuitous asymmetry that we can use pattern
    matching to recover from
    reference widening, but not from primitive widening. While many of the
    arguments against doing primitive type patterns now were of the
    form "let's keep
    things simple", I believe that the simpler solution is actually to
    _finish the
    job_, because this minimizes asymmetries and potholes that users
    would otherwise
    have to maintain a mental catalog of.

    Our earlier explorations started (incorrectly, as it turned out), with
    assignment context.  This direction gave us a good push in the
    right direction,
    but turned out to not be the right answer.  A more careful reading
    of JLS Ch5
    convinced me that the answer lies not in assignment conversion,
    but _cast
    conversion_.

    #### Stepping back: instanceof

    The right place to start is actually not patterns, but
    `instanceof`.  If we
    start here, and listen carefully to the specification, it leads us
    to the
    correct answer.

    Today, `instanceof` works only for reference types. Accordingly,
    most people
    view `instanceof` as "the subtyping operator" -- because that's
    the only
    question we can currently ask it.  We almost never see
    `instanceof` on its own;
    it is nearly always followed by a cast to the same type. 
    Similarly, we rarely
    see a cast on its own; it is nearly always preceded by an
    `instanceof` for the
    same type.

    There's a reason these two operations travel together: casting is,
    in general,
    unsafe; we can try to cast an `Object` reference to a `String`,
    but if the
    reference refers to another type, the cast will fail. So to make
    casting safe,
    we precede it with an `instanceof` test.  The semantics of
    `instanceof` and
    casting align such that `instanceof` is the precondition test for
    safe casting.

    > instanceof is the precondition for safe casting

    Asking `instanceof T` means "if I cast this to T, would I like the
    answer."
    Obviously CCE is an unlikable answer; `instanceof` further adopts
    the opinion
    that casting `null` would also be an unlikable answer, because
    while the cast
    would succeed, you can't do anything useful with the result.

    Currently, `instanceof` is only defined on reference types, and on
    this domain
    coincides with subtyping.  On the other hand, casting is defined
    between
    primitive types (widening, narrowing), and between primitive and
    reference types
    (boxing, unboxing).  Some casts involving primitives yield
    "better" results than
    others; casting `0` to `byte` results in no loss of information,
    since `0` is
    representable as a byte, but casting `500` to `byte` succeeds but
    loses
    information because the higher order bits are discarded.

    If we characterize some casts as "lossy" and others as "exact" --
    where lossy
    means discarding useful information -- we can extend the "safe casting
    precondition" meaning of `instanceof` to primitive operands and
    types in the
    obvious way -- "would casting this expression to this type succeed
    without error
    and without information loss."  If the type of the expression is
    not castable to
    the type we are asking about, we know the cast cannot succeed and
    reject the
    `instanceof` test at compile time.

    Defining which casts are lossy and which are exact is fairly
    straightforward; we
    can appeal to the concept already in the JLS of "representable in
    the range of a
    type."  For some pairs of types, casting is always exact (e.g.,
    casting `int` to
    `long` is always exact); we call these "unconditionally exact". 
    For other pairs
    of types, some values can be cast exactly and others cannot.

    Defining which casts are exact gives us a simple and precise
    semantics for `x
    instanceof T`: whether `x` can be cast exactly to `T`. Similarly,
    if the static
    type of `x` is not castable to `T`, then the corresponding
    `instanceof` question
    is rejected statically.  The answers are not suprising:

     - Boxing is always exact;
     - Unboxing is exact for all non-null values;
     - Reference widening is always exact;
     - Reference narrowing is exact if the type of the target
    expression is a
       subtype of the target type;
     - Primitive widening and narrowing are exact if the target
    expression can be
       represented in the range of the target type.

    #### Primitive type patterns

    It is a short hop from `instanceof` to patterns (including
    primitive type
    patterns, and reference type patterns applied to primitive types),
    which can be
    defined entirely in terms of cast conversion and exactness:

     - A type pattern `T t` is applicable to a target of type `S` if
    `S` is
       cast-convertible to `T`;
     - A type pattern `T t` matches a target `x` if `x` can be cast
    exactly to `T`;
     - A type pattern `T t` is unconditional at type `S` if casting
    from `T` to `S`
       is unconditionally exact;
     - A type pattern `T t` dominates a type pattern `S s` (or a
    record pattern
       `S(...)`) if `T t` would be unconditional on `S`.

    While the rules for casting are complex, primitive patterns add no new
    complexity; there are no new conversions or conversion contexts. 
    If we see:

        switch (a) {
            case T t: ...
        }

    we know the case matches if `a` can be cast exactly to `T`, and
    the pattern is
    unconditional if _all_ values of `a`'s type can be cast exactly to
    `T`.  Note
    that none of this is specific to primitives; we derive the
    semantics of _all_
    type patterns from the enhanced definition of casting.

    Now, our record deconstruction examples work symmetrically to
    construction:

        case R(int i)     // OK
        case R(short s)   // test if `i` is in the range of `short`
        case R(Integer i) // box `i` to `Integer`

Re: Primitives in instanceof and patterns

Reply via email to