> From: "Brian Goetz" <brian.go...@oracle.com> > To: "amber-spec-experts" <amber-spec-experts@openjdk.java.net> > Sent: Tuesday, March 29, 2022 11:01:18 PM > Subject: Declared patterns -- translation and reflection
> Time to take a peek ahead at _declared patterns_. Declared patterns come in > three varieties -- deconstruction patterns, static patterns, and instance > patterns (corresponding to constructors, static methods, and instance > methods.) > I'm going to start with deconstruction patterns, but the basic game is the > same > for all three. > Ignoring the trivial details, a deconstruction pattern looks like a > "constructor > in reverse": > ```{.java} > class Point { > int x, y; > Point(int x, int y) { > this.x = x; > this.y = y; > } [....] > } > ``` > Deconstruction patterns share the weird behaviors that constructors have in > that > they are instance members, but are not inherited, and that rather having > names, > they are accessed via the class name. > Deconstruction patterns differ from static/instance patterns in that they are > by > definition total; they cannot fail to match. (This is a somewhat arbitrary > simplification in the object model, but a reasonable one.) They also cannot > have any input parameters, other than the receiver. > Patterns differ from their ctor/method counterparts in that they have what > appear to be _two_ argument lists; a parameter list (like ctors and methods), > and a _binding_ list. The parameter list is often empty (with the receiver as > the match target). The binding list can be thought of as a "conditional > multiple return". That they may return multiple values (and, for partial > patterns, can return no values at all when they don't match) presents a > challenge for translation to classfiles, and for the reflection model. > #### Translation to methods > Patterns contain imperative code, so surely we want to translate them to > methods > in some way. The pattern input parameters map cleanly to method parameters. > The pattern bindings need to tunneled, somehow, through the method return (or > some other mechanism). For our deconstructor, we might translate as: > PatternCarrier <dtor>() > (where the method applies the pattern, and PatternCarrier wraps and provides > access to the bindings) or > PatternObject <dtor>() > (where PatternObject provides indirection to behavior to invoke the pattern, > which in turn returns the carrier.) > With either of these approaches, though, the pattern name is a problem, > because > patterns can be overloaded on their _bindings_, but both of these return types > are insensitive to bindings. > It is useful to characterize the "shape" of a pattern with a MethodType, where > the parameters of the MethodType are the binding types. (The return type is > less constrained, but it is sometimes useful to use the return type of the > MethodType for the required type of the pattern.) Call this the "descriptor" > of > the pattern. > If we do this, we can use some name mangling to encode the descriptor in the > method name: > PatternCarrier name$mangle() > The mangling has to be stable across compilations with respect to any source- > and binary-compatible changes to the pattern declaration. One mangling that > works quite well is to use the "symbolic-freedom encoding" of the erasure of > the pattern descriptor. Because the erasure of the descriptor is exactly as > stable as any other method signature derived from source declarations, it will > have the desired binary compatibility properties, overriding will work as > expected, etc. I think we need a least to use a special name like <deconstructor> the same way we have <init>. I agree that we also need to encode the method type descriptor (the carrier type) into the name, so the name of the method in the classfile should be <deconstructor+mangle> or <name+mangle> (or perhaps <pattern+name+mangle> ofr the pattern methods). > #### Return value > In an earlier design, we used a pattern object (which was a bundle of method > handles) as the return value of the pattern. This enabled clients to invoke > these via condy and bind method handles into the constant pool for > deconstruction and static patterns. > Either way, we make use of some sort of carrier object to carry the bindings > from the pattern to the client; either we return the carrier from the pattern > method, or there is a method on the pattern object that we invoke to get a > carrier. We have a few preferences about the carrier; we'd like to be able to > late-bind to the actual implementation (i.e., we don't want to freeze the name > of a carrier class in the method descriptor), and at least for records, we'd > like to let the record instance itself be the carrier (since it is immutable > and we can just invoke the accessors to get the bindings.) So the return type is either Object (too hide the type of the carrier) or a lambda that returns an Object (PatternObject or PatternCarrier acting like a glorified lambda). > #### Carriers > As part of the work on template strings, Jim has put back some code that was > originally written for the purpose of translating patterns, called "carriers". > There are methods / bootstraps that take a MethodType and return method > handles > to (a) encode values of those types into an opaque carrier object and (b) pull > individual values out of a carrier. This means that the choice of carrier > object can be deferred to runtime, as long as both the bundling and unbundling > methods handles agree on the carrier form. > The choice of carrier is largely a footprint/specificity tradeoff. One could > imagine a carrier class per shape, or a single carrier class that wraps an > Object[], or caching some number of common shapes (three ints and two refs). > This sort of tuning should be separate from the protocol encoded in the > bytecode of the pattern method and its clients. > The pattern matching runtime will provide some condy bootstraps which wrap the > Carriers behavior. > Since at least some patterns are conditional, we have to have a way to encode > failure into the protocol. For a partial pattern, we can use a B2 carrier and > use null to encode failure to match; for a total pattern, we can use a B3 > carrier. > #### Proposed encoding > Earlier explorations did a lot of work to preserve the optimization that a > match > target can be its own carrier. But further analysis reveals that the cost of > doing so for other than records is pretty substantial and works against the > model of a pattern declaration being an imperative body of code that runs at > match time. So for record patterns, we can "inline" them by using `instanceof` > as the applicability test and accessors for extraction, and for all other > patterns, go through the carrier runtime. > This allows us to encode pattern methods as > Object name$mangle(ARGS) > and have the pattern method do the match and return a carrier (or null), using > the carrier object that the carrier runtime associates with the pattern > descriptor. And clients can take apart the result again using the extraction > logic that the carrier runtime associates with the pattern descriptor. > This also means that instance patterns "just work" because virtual dispatch > selects the right implementation for us automatically, and all implementations > that can be overrides will also implicitly agree on the encoding. > Because patterns are methods, we can take advantage of all the affordances of > methods. We can use access bits to control accessibility in the obvious way; > we > can use the attributes that carry annotations, method parameter metadata, and > generics signatures to carry information about the pattern declaration and its > parameters. What's missing is a place to put metadata for the *bindings*, and > to record the fact that this is a pattern implementation and not an ordinary > method. So, we add the following attribute on pattern methods: > Pattern { > u2 attr_name; > u4 attr_length; > u2 patternFlags; // bitmask > u2 patternName; // index of UTF8 constant > u2 patternDescr; // index of MethodType (or alternately UTF8) constant > u2 attributes_count; > attribute_info attributes[attributes_count]; > } > This says that "this method is a pattern", reifies the name of the pattern > (patternName), reifies the pattern descriptor (patternDescr) which encodes the > types of the bindings as a method descriptor or MethodType, and has attributes > which can carry annotations, parameter metadata, and signature metadata for > the > bindings. The existing attributes (e.g. Signature, ParameterNames, RVAA) can > be > reused as is, with the interpretation that this is the signature (or names, or > annos) of the *bindings*, not the input parameters. Flags can carry things > like > "deconstructor pattern" or "partial pattern" as needed. >From the classfile POV, a constructor is a method with a funny name in between >brackets, i think deconstructor and pattern methods should work the same way. Unlike a constructor, we need a way to attach the carrier type (and perhaps the pattern name) on the side, so an attribute on the pattern method seems the right choice. > ## Reflection > We already have a sensible base class in the reflection library for reflecting > patterns: Executable. All of the methods on Executable make sense for > patterns, > including Object as the return type. If the pattern is reflectively invoked, > it > will return null (for no match) or an Object[]; this Object[] can be thought > of > as the boxing of the carrier. Since the method return type is Object, this is > an entirely reasonable interpretation. > We need some additional methods to describe the bindings, so we would have a > subtype of Executable for Pattern, with methods like getBindings(), > getAnnotatedBindings(), getGenericBindings(), isDeconstructor(), isPartial(), > etc. I agree if getBindings() return a Class<?>[]. As i said, apart from the semantics implied by the proposed syntax, the rest of the design is great. Rémi