Deconstructor (and pattern) overload selection

Brian Goetz Mon, 01 Apr 2024 09:35:12 -0700

The next big pattern matching JEP will be about deconstructionpatterns. (Static and instance patterns will likely come separately.) Now that we've got the bikeshed painting underway, there are a few otherloose ends here, and one of them is overload selection.

We explored taking the existing overload selection algorithm and turningit inside out, but after going down that road a bit, I think this bothunnecessarily much complexity for not enough value, and also potentiallyfraught with nasty corner cases. I think there is a much simpler answerhere which is entirely good enough.

First, let's remind ourselves, why do we have constructor overloading inthe first place? There are three main reasons:

- Concision. If a fully-general constructor takes many parameters,but not all are essential to the use case, then the construction sitebecomes a site of accidental complexity. Being able to handle commongrouping of parameters simplifies use sites.

- Flexibility. Related to the above, not only might the user not needto specify a given constructor parameter, but they want the flexibilityof saying "let the implementation pick the best value". Constructorswith fewer parameters reserve more flexibility for the implementation.

- Alternative representations. Some objects may take multiplerepresentations as input, such as accepting a Date, a LocalDate, or aLocalDateTime.

The first two cases are generally handled with "telescoping constructornests", where we have:


    Foo(A a)
    Foo(A a, B b)
    Foo(A a, B b, C d, D d)

Sometimes the telescopes don't fold perfectly, and becomes "trees":

    Foo(A a)
    Foo(A a, B b)
    Foo(A a, C c, D d)
    Foo(A a, B b, C d, D d)

Which constructors to include are subjective judgments on the part ofclass authors to find good tradeoffs between code size andconcision/flexibility.

We had initially assumed that each constructor overload would have acorresponding deconstructor, but further experimentation suggests thisis not an ideal assumption.

Clue One that it is not a good assumption comes from the asymmetrybetween constructors and deconstructors; if we have constructors anddeconstructors of shape C(List), then it is OK to invoke C's constructorwith List or its subtypes, but we can invoke C's deconstructor with Listor its subtypes or its supertypes.

Clue Two is that applicability for constructors is based on methodinvocation context, but applicability for deconstructors is based oncast context, which has different rules. It seems unlikely that we willever get symmetry given this.

The "Flexibility" requirement does not really apply to deconstructors;having a deconstructor that accepts additional bindings does notconstrain anything, not in the same way as a constructor takingneedlessly specific arguments. Imagine if ArrayList had onlyconstructors that take int (for array capacity); this is terrible forthe constructor, because it forces a resource management decision ontousers who will not likely make a very good decision, and one that ishard to change later, but pretty much harmless for deconstructors.

The "Concision" requirement does not really apply as much todeconstructors as constructors; matching with `Foo(var a, _, _)` is notnearly as painful as invoking with lots of parameters, each of whichrequire an explicit choice by the user.

So the main reason for overloading deconstructors is to matchrepresentations with the constructor overloads -- but with a given"representation set", there probably does not need to be as manydeconstructors as constructors. What we really need is to match the"maximal" constructor in a telescoping nest with a correspondingdeconstructor, or for a tree-shaped set, one for each "maximal"representation.


So for a class with constructors

    Foo()
    Foo(A a)
    Foo(A a, B B)
    Foo(X x)
    Foo(X x, Y y)

we would want dtors for (A,B) and (X,Y), but don't really need the others.

So, let's start fresh on overload selection. Deconstructors have a setof applicability rules based on arity first (eventually, varargs, butnot yet) and then on applicability of type patterns, which is in turnrooted in castability. Because we don't have the compatibility problemintroduced by autoboxing, we can ignore the distinction between phase 1and 2 of overload selection (we will have this problem with varargslater, though.)

Given this, the main question we have to resolve is to what degree -- ifany -- we may deem one overload "more applicable" than others. I thinkthere is one rule here that is forced: an exact type match (moduloerasure) is more applicable than an inexact type match. So given:


    D(Object o)
    D(String s)

then

    case D(String s)

should choose the latter. This allows the client to (mostly) steer to aspecific overload just by using the right types (rather than `var` or asubtype.) It is not clear to me whether we need anything more here; inthe event of ambiguity, a client can pick the right overload with theright type patterns. (Nested patterns may need to be manually unrolledto subsequent clauses in some cases.)

So basically (on a per-binding basis): an exact match is more applicablethan an inexact match, and ... that's it. Users can steer towards aparticular overload by selecting exact matches on enough bindings. Libraries can provide their own "joins" if they want to disambiguateproblematic overloads like:


    D(Object o, String s)
    D(String s, Object o)

Deconstructor (and pattern) overload selection

Reply via email to