Pattern matching -- background and design goals

Brian Goetz Thu, 20 Apr 2017 11:48:27 -0700

I would like to start the discussion surrounding pattern matching withsome motivation and goals. The design space here is enormous, andconnected to so many other features, so bear with me as I try tolinearize the story. I've posted a general introductory document onpossibilities for pattern matching in Java here:


http://cr.openjdk.java.net/~briangoetz/amber/pattern-match.html

The first question is why, as in: Why is this feature needed in Java atall?

I'll start with the obvious reasons (explicated in greater detail in thedocument): the common pattern of testing an input against multiplecandidates not only suffers from boilerplate, but, more importantly, thetools we have for this give bugs too many places to hide, obfuscate theessential business logic, and introduce accidental ordering constraintsthat make programs slower than necessary.

There are also some less obvious reasons why this makes sense. What'sbeing proposed here are really two interrelated features here: bettersupport for compositional test-and-extract (destructuring), and bettersupport for handling do-exactly-one-of-these constructs. While eithercould stand on its own, the two together are much stronger. Of the two,destructuring is the richer concept.

To illustrate why destructuring is so important, allow me to make acomparison to Lambda. Adding the ability to pass behavior as data meantthat we could build richer, more powerful libraries. We could moveresponsibility for control flow from client code (external iteration) tolibraries (internal iteration), enabling libraries to expose richer,higher-level operations. Additionally, by exposing higher-leveloperations, libraries might not have to expose fiddly, state-dependentlow-level operations like Iterator.hasNext() / next(), and thereforedon't need to code iteration logic against being called in the wrongorder or in the wrong state.

Destructuring has a similar abstraction-raising characteristic. Considerthe methods Class.isArray() and Class.getComponentType(); the latteronly returns a value if the former returns true. Method pairs like thisoffer the worst of both worlds; the client has to make two calls, andhas a chance to do it wrong -- and the library has to defend against theclient getting it wrong. These two methods really should be oneoperation, which simplifies both the client invocation and the libraryimplementation:


    if (c matches Class.arrayClass(Class<?> componentType)) { ... }

Further, decomposition (when done right), is compositional, enabling usto express a compound test+extract operation as a single operation,rather than as a sequence of operations, each of which can individuallyfail. This brings client code more in line with how the problemstatement is typically stated.

Finally, this is a feature with depth; we can start with simple patternsand simple pattern-aware language constructs, and add more sophisticatedkinds of patterns and more constructs that can use patterns as we go.This in turn means a greater return on conceptual complexity.

If one looked only at the first few examples in the document, one mightbe tempted to ask "Why not 'just' do flow typing"? Flow typing wouldeliminate the (irritating) need to cast something to X right afterhaving done an instanceof test against X. It seems like a cheap win(though would invariably put pressure on our handling of intersectiontypes), but it just doesn't go very far -- essentially it eliminates asingle increment of boilerplate and a single place where error can creepin. This is not nothing, and it's a defensible choice, but it seemslike it would be a missed opportunity.

Similarly, one might ask "Why not 'just' do type switch?". Again, thisis a feature that seems merely "additive" rather than multiplicative; itagain eliminates some boilerplate in repeated type tests, but it doesn'tgo much farther than that.

Which brings me around to a deeper observation, which we think is at thedesign center of this feature: destructuring is the natural dual ofaggregation, and an obvious (and yet missing) component ofobject-oriented modeling.

People often associate pattern-matching as being a "functionalprogramming" feature. But any language that supports aggregation (thatis, all of them) also has to address how aggregates are going to bedecomposed. Scala moved the ball forward here by showing that having adestructuring operator ("unapply") on objects allows us to match anddestructure objects in terms of what their constructor invocation lookslike (regardless of their representation.) This is not "OO borrows FPidea", this is "OO applies sensible programming concept in a natural OOway."

That said, I think Scala didn't go far enough. (No criticism intended;they moved the ball forward dramatically, there's just farther to go.)For what are likely mostly accidental reasons, Scala only allows asingle constructor pattern per class, even though constructorsthemselves can be overloaded.

Again, without criticism: I'm guessing most of the motivation forScala's pattern matching was drawn from the distinguished andwell-behaved case of algebraic data types, not general objects. So thelimitation of a single unapply was not really a problem; algebraic datatypes don't go out of their way to hide or evolve their representation.But we can apply destructuring to a wider range of targets, in a more OOway.


So, to put a stake in the ground, we believe that:

- destructuring is the dual of construction, and should be treated asa first-class construct;- destructuring an object should be syntactically similar to how thatobject is constructed.


From these principles, it follows that:

- if you can declare a ctor, you should be able to declare an instancedtor;- instance dtors should support the same overloading and inheritancebehaviors as ctors;- if you can declare a static factory, you should be able to declare astatic dtor.


(The syntax of how dtors are declared is a topic for another day.)

If an object is created with a constructor:

    x = new Foo(a, b)

ideally, it should be destructurable with an instance dtor:

    if (x matches Foo(var a, var b))

Note the syntactic similarity; this is not accidental. Acquiring dtorsisn't automatic (though data classes can automatically acquire bothctors and dtors), but it should be easy and straightforward to declaresuch a dtor.

If dtors are overloaded, type information at the use site may be neededto disambiguate. So if we provide dtors Foo(int y) and Foo(String y), then


    if (x matches Foo(var y))

is ambiguous, but by replacing the var-pattern with a an explicittype-test pattern, the compiler can properly select:


    if (x matches Foo(int x))

Similarly, if an object is created with a static factory:

    x = Foo.of(a, b)

it should be destructurable with static dtors:

    if (x matches Foo.of(var a, var b))

So, for example, since Optional has two factories:

    x = Optional.of(y)
    x = Optional.empty()

one can discriminate via:

    case Optional.of(var y):
    case Optional.empty():

Why does this matter? By way of example, we often prefer factorymethods to constructors. A factory has more flexibility than a ctor; itcan return different subtypes in different conditions, and it can changeits implementation over time to return different subtypes. That afactory commits only to an abstract type:


    static Foo redFoo() {
        return new MyPrivateFooImplementation();
    }

is great for the implementor, but less so for the client -- once thecaller gets a Foo from the factory, it has no way of asking "is this ared Foo?", unless we clutter the API with ad-hoc methods likeisRedFoo(). This means that APIs with multiple factories have some badchoices: either live with the opacity, which can limit what the clientcan do (sometimes this is good, but sometimes it's not), add lots of newAPI points for destructuring, or expose the intermediate types.

Having a complementary dtor allows the implementation to hide itsdetails while still providing a degree of reversibility. Withoutventuring near the toxic syntax bikeshed:


   static ... matcher ... Foo ... redFoo(...) { ... }

the library author can render the factory reversible without exposingthe implementation details.


    x = redFoo();
    ...

    if (x matches redFoo()) { ... }

The implementation of redFoo() remains hidden, but because the dtor ispart of the same library as the factory, it can still help a clientreconstruct the object's state (or the parts that it wants to), withoutcompromising the encapsulation. (In just the last few weeks of thinkingabout this, once you start to see this pattern, you can't look at an APIand not see it.)

Again, this is mostly something to file in the background and thinkabout the consequences of -- until we start to circulate a JEP forpattern matching, this is still technically outside the Amber charter.

Pattern matching -- background and design goals

Reply via email to