Re: Declared patterns -- translation and reflection

Remi Forax Tue, 29 Mar 2022 15:37:16 -0700

> From: "Brian Goetz" <brian.go...@oracle.com>
> To: "amber-spec-experts" <amber-spec-experts@openjdk.java.net>
> Sent: Tuesday, March 29, 2022 11:01:18 PM
> Subject: Declared patterns -- translation and reflection


> Time to take a peek ahead at _declared patterns_. Declared patterns come in
> three varieties -- deconstruction patterns, static patterns, and instance
> patterns (corresponding to constructors, static methods, and instance 
> methods.)
> I'm going to start with deconstruction patterns, but the basic game is the 
> same
> for all three.

> Ignoring the trivial details, a deconstruction pattern looks like a 
> "constructor
> in reverse":

> ```{.java}
> class Point {
> int x, y;

> Point(int x, int y) {
> this.x = x;
> this.y = y;
> }

[....] 

> }
> ```

> Deconstruction patterns share the weird behaviors that constructors have in 
> that
> they are instance members, but are not inherited, and that rather having 
> names,
> they are accessed via the class name.

> Deconstruction patterns differ from static/instance patterns in that they are 
> by
> definition total; they cannot fail to match. (This is a somewhat arbitrary
> simplification in the object model, but a reasonable one.) They also cannot
> have any input parameters, other than the receiver.

> Patterns differ from their ctor/method counterparts in that they have what
> appear to be _two_ argument lists; a parameter list (like ctors and methods),
> and a _binding_ list. The parameter list is often empty (with the receiver as
> the match target). The binding list can be thought of as a "conditional
> multiple return". That they may return multiple values (and, for partial
> patterns, can return no values at all when they don't match) presents a
> challenge for translation to classfiles, and for the reflection model.

> #### Translation to methods

> Patterns contain imperative code, so surely we want to translate them to 
> methods
> in some way. The pattern input parameters map cleanly to method parameters.

> The pattern bindings need to tunneled, somehow, through the method return (or
> some other mechanism). For our deconstructor, we might translate as:

> PatternCarrier <dtor>()

> (where the method applies the pattern, and PatternCarrier wraps and provides
> access to the bindings) or

> PatternObject <dtor>()

> (where PatternObject provides indirection to behavior to invoke the pattern,
> which in turn returns the carrier.)

> With either of these approaches, though, the pattern name is a problem, 
> because
> patterns can be overloaded on their _bindings_, but both of these return types
> are insensitive to bindings.

> It is useful to characterize the "shape" of a pattern with a MethodType, where
> the parameters of the MethodType are the binding types. (The return type is
> less constrained, but it is sometimes useful to use the return type of the
> MethodType for the required type of the pattern.) Call this the "descriptor" 
> of
> the pattern.

> If we do this, we can use some name mangling to encode the descriptor in the
> method name:

> PatternCarrier name$mangle()

> The mangling has to be stable across compilations with respect to any source-
> and binary-compatible changes to the pattern declaration. One mangling that
> works quite well is to use the "symbolic-freedom encoding" of the erasure of
> the pattern descriptor. Because the erasure of the descriptor is exactly as
> stable as any other method signature derived from source declarations, it will
> have the desired binary compatibility properties, overriding will work as
> expected, etc.
I think we need a least to use a special name like <deconstructor> the same way 
we have <init>. 
I agree that we also need to encode the method type descriptor (the carrier 
type) into the name, so the name of the method in the classfile should be 
<deconstructor+mangle> or <name+mangle> (or perhaps <pattern+name+mangle> ofr 
the pattern methods). 

> #### Return value

> In an earlier design, we used a pattern object (which was a bundle of method
> handles) as the return value of the pattern. This enabled clients to invoke
> these via condy and bind method handles into the constant pool for
> deconstruction and static patterns.

> Either way, we make use of some sort of carrier object to carry the bindings
> from the pattern to the client; either we return the carrier from the pattern
> method, or there is a method on the pattern object that we invoke to get a
> carrier. We have a few preferences about the carrier; we'd like to be able to
> late-bind to the actual implementation (i.e., we don't want to freeze the name
> of a carrier class in the method descriptor), and at least for records, we'd
> like to let the record instance itself be the carrier (since it is immutable
> and we can just invoke the accessors to get the bindings.)
So the return type is either Object (too hide the type of the carrier) or a 
lambda that returns an Object (PatternObject or PatternCarrier acting like a 
glorified lambda). 

> #### Carriers

> As part of the work on template strings, Jim has put back some code that was
> originally written for the purpose of translating patterns, called "carriers".
> There are methods / bootstraps that take a MethodType and return method 
> handles
> to (a) encode values of those types into an opaque carrier object and (b) pull
> individual values out of a carrier. This means that the choice of carrier
> object can be deferred to runtime, as long as both the bundling and unbundling
> methods handles agree on the carrier form.

> The choice of carrier is largely a footprint/specificity tradeoff. One could
> imagine a carrier class per shape, or a single carrier class that wraps an
> Object[], or caching some number of common shapes (three ints and two refs).
> This sort of tuning should be separate from the protocol encoded in the
> bytecode of the pattern method and its clients.

> The pattern matching runtime will provide some condy bootstraps which wrap the
> Carriers behavior.

> Since at least some patterns are conditional, we have to have a way to encode
> failure into the protocol. For a partial pattern, we can use a B2 carrier and
> use null to encode failure to match; for a total pattern, we can use a B3
> carrier.

> #### Proposed encoding

> Earlier explorations did a lot of work to preserve the optimization that a 
> match
> target can be its own carrier. But further analysis reveals that the cost of
> doing so for other than records is pretty substantial and works against the
> model of a pattern declaration being an imperative body of code that runs at
> match time. So for record patterns, we can "inline" them by using `instanceof`
> as the applicability test and accessors for extraction, and for all other
> patterns, go through the carrier runtime.

> This allows us to encode pattern methods as

> Object name$mangle(ARGS)

> and have the pattern method do the match and return a carrier (or null), using
> the carrier object that the carrier runtime associates with the pattern
> descriptor. And clients can take apart the result again using the extraction
> logic that the carrier runtime associates with the pattern descriptor.

> This also means that instance patterns "just work" because virtual dispatch
> selects the right implementation for us automatically, and all implementations
> that can be overrides will also implicitly agree on the encoding.

> Because patterns are methods, we can take advantage of all the affordances of
> methods. We can use access bits to control accessibility in the obvious way; 
> we
> can use the attributes that carry annotations, method parameter metadata, and
> generics signatures to carry information about the pattern declaration and its
> parameters. What's missing is a place to put metadata for the *bindings*, and
> to record the fact that this is a pattern implementation and not an ordinary
> method. So, we add the following attribute on pattern methods:

> Pattern {
> u2 attr_name;
> u4 attr_length;
> u2 patternFlags; // bitmask
> u2 patternName; // index of UTF8 constant
> u2 patternDescr; // index of MethodType (or alternately UTF8) constant
> u2 attributes_count;
> attribute_info attributes[attributes_count];
> }

> This says that "this method is a pattern", reifies the name of the pattern
> (patternName), reifies the pattern descriptor (patternDescr) which encodes the
> types of the bindings as a method descriptor or MethodType, and has attributes
> which can carry annotations, parameter metadata, and signature metadata for 
> the
> bindings. The existing attributes (e.g. Signature, ParameterNames, RVAA) can 
> be
> reused as is, with the interpretation that this is the signature (or names, or
> annos) of the *bindings*, not the input parameters. Flags can carry things 
> like
> "deconstructor pattern" or "partial pattern" as needed.
>From the classfile POV, a constructor is a method with a funny name in between 
>brackets, i think deconstructor and pattern methods should work the same way. 
Unlike a constructor, we need a way to attach the carrier type (and perhaps the 
pattern name) on the side, so an attribute on the pattern method seems the 
right choice. 

> ## Reflection

> We already have a sensible base class in the reflection library for reflecting
> patterns: Executable. All of the methods on Executable make sense for 
> patterns,
> including Object as the return type. If the pattern is reflectively invoked, 
> it
> will return null (for no match) or an Object[]; this Object[] can be thought 
> of
> as the boxing of the carrier. Since the method return type is Object, this is
> an entirely reasonable interpretation.

> We need some additional methods to describe the bindings, so we would have a
> subtype of Executable for Pattern, with methods like getBindings(),
> getAnnotatedBindings(), getGenericBindings(), isDeconstructor(), isPartial(),
> etc.
I agree if getBindings() return a Class<?>[]. 

As i said, apart from the semantics implied by the proposed syntax, the rest of 
the design is great. 

Rémi

Re: Declared patterns -- translation and reflection

Reply via email to