Re: Updated document on data classes and sealed types

Brian Goetz Thu, 07 Mar 2019 12:48:22 -0800

Thanks for these great comments.  These cut to the heart of some uncomfortable 
tradeoffs.

> I have two remarks about this proposal. The first is basically: why allow 
> overriding accessors? If a record is required to have a one-to-one 
> correspondence between its (private final) fields and its public accessors, 
> and is required to “give up [its] data freely to all requestors” what 
> possible override could be correct? It makes sense to allow overriding the 
> constructor, for validation and normalization, but once the fields are 
> cemented in place, what could an accessor do but return its corresponding 
> field?

Yes, overriding accessors could be abused to avoid giving up the classes data; 
they could be overridden to throw, for example, which would undermine the “give 
up their data easily” dictum.  

Note that if overriding accessors were not allowed, it would be as if the 
fields were public and final.  (We actually considered that as an option, 
briefly.)  I actually think that public final fields get a bad rap, but the 
Uniform Access principle encourages accessors, and public final fields freak a 
lot of people out.  So between the two, non-overridable accessors seem better.  

But, there’s still a reason to allow overriding accessors — mutable types which 
don’t provide unmodifiable views — arrays being the obvious case.  Yes, arrays 
and records are an uncomfortable pairing, but if you can override the 
accessors, at least you can clone them on the way out.  (If you can’t override 
the accessors, people might still use records with arrays, and then expose the 
mutable state, possibly without realizing it.  That seems worse.)  So 
overriding accessors seems like it should be in the “safe, legal, and rare” 
category.  

Note too that the deconstruction pattern is likely to delegate to the 
accessors, so that you only have to override things in one place to prevent 
mutable state from leaking.  

There’s another consideration, too.  We considered outlawing overriding the 
equals/hashCode method.  This goes a long way towards enforcing the desired 
invariants, but again seems pretty restrictive.  And, having an irregular set 
of rules about what can be overridden and what can’t (e.g., no to equals, yes 
to toString), seems likely to (a) make the feature harder to learn/undersatnd 
and (b) lead to lots more “why can’t I, I just want to ….” complaints.  Better 
to have an all-or-nothing treatment of overriding, even though people can 
undermine the intent by careless overriding.  (One thing working in our favor 
here is that, if you’re overriding a bunch of methods, the concision benefit 
drops a lot, so that helps limit the problem.)  

Before I jump into the second, let me talk about intended overriding modes for 
the constructor.  These are primarily: validation and normalization.  

The validation cases are obvious:

     record Range(int lo, int hi) { 
          public Range {
               if (lo > hi) throw new LowGreaterThanHighException();
          }
     }

Normalization can happen on single arguments or multiple:

     record Person(String name) { 
          public Person {
               name = name.toUpperCase();
          }
     }

(Note that I’m mutating the parameter, which will then get written to the 
field.)

     record Rational(int num, int denom) { 
          public Range {
              int gcd = gcd(num, denim);
              num /= gcd;
              denom /= gcd;
          }
     }

> 
> My second remark is much more long-winded, and inspired by the first. The 
> TL;DR version is: what about normalization and derived fields? 

This is two questions :)  Let’s start with the first.

> In the longer version below, I’ll be using Fraction as an example of a simple 
> class that could be a record instead, where normalization is reducing a 
> fraction to simplest form. However, please generalize from this: it could 
> apply to any record where a derived field can be computed from the provided 
> fields by computing a perhaps-expensive pure function on them.

Rational numbers are a great example; Guy raised these earlier as well.  Where 
rationals challenge the model here is: the user provided a state vector of (4, 
2), but the final state of the object is (2, 1).  This is at odds with the 
following desirable-seeming invariant:

     record Foo(int x, int y)
     assert new Foo(1, 2).x() == 1
     assert new Foo(1, 2).y() == 2

That is, if we normalize any fields in the ctor, then the relationship of “the 
constructor argument x and the accessor x() are referring to the same state” 
appears to be severed.

> 
> 
> A Fraction library can’t satisfy both Fran and Peter. It has to choose a 
> place to do this normalization, or else decline to do it at all - but this is 
> no solution, as now the class has very sharp edges, really no more useful 
> than a Pair<Integer, Integer>.

That’s true, but what Peter really _wants_ is an IntIntPair class!  Because his 
goals are that it should hold the pair, and do no extra computation (and commit 
to no additional semantic requirements).  And he can easily write one.  (Or, he 
could get over his micro-performance obsession and use Fran’s class.)  

So, let’s wrap up normalization before we get to derived fields.  I don’t mind 
the Peter/Fran tension here, but I am mildly uncomfortable at the fact that 
“new Foo(x, y).x() == x” doesn’t always hold, because it complicates an 
attractive invariant.  

The actual invariant you get with normalization is slightly more complicated: 
that there be a projection-embedding pair between the constructor arguments and 
the representation.  Let’s write the ctor args and state as a tuple; while ctor 
\andThen dtor is not an identity, going around the other way (dtor \then ctor) 
is as long as the normalization is well-defined and consistently applied.   
This is a tradeoff of simplicity vs usefulness; overall it seems a fair 
balance.  

> 
> There are two possible solutions I see to this. The first is to permit some 
> kind of derived-field mechanism, preferably lazy. Then, Fraction’s 
> constructor would save a thunk for producing the reduced form, and refer to 
> that thunk in the numerator() and denominator() accessors, but ignore it in 
> the #mul method so that we don’t pay the cost of reducing unless we want it 
> (here, imagine reducing a Fraction is more expensive than allocating a thunk).

The stricture against derived fields was probably the hardest choice here.  On 
the one hand, strictly derived fields are safe and don’t undermine the 
invariants; on the other, without more help from the language or runtime, we 
can’t enforce that additional fields are actually derived, *and* it will be 
ultra-super-duper-tempting to make them not so.  (I don’t see remotely as much 
temptation to implement maliciously nonconformant accessors or equals methods.) 
 If we allowed additional fields, we would surely have to lock down 
equals/hashCode.  

We’re exploring the notion of lazy final fields; I think that would move the 
balance on allowing additional fields, since the mechanism would push pretty 
hard to making them truly derived from the record state.  

> 
> The second is to simply say that Fraction is a bad candidate for a record, 
> because it wants to decouple its interface from its implementation. I think 
> this is actually the right approach, but it may be unconvincing because of 
> how “obvious” it is that a Fraction is just a pair with some extra 
> calculations to perform based on its components. If we say that Fraction is a 
> bad record, I worry that many more bad records like it will be built, and 
> their subtle problems discovered only after their APIs have been published 
> and committed to. Further, if this is indeed a bad record, I can’t think of 
> any other good use case for overriding an accessor method (my first remark).

I’m sympathetic to both sides of this argument.  One the one hand, we want the 
feature to be useful; on the other, we want it to have a clear, unambiguous 
user model.  

A third explanation is that Peter’s expectations are either unreasonable or 
inconsistent with the idea of using someone else’s library class.  

> 
> On Fri, Mar 1, 2019 at 12:28 PM Brian Goetz <brian.go...@oracle.com 
> <mailto:brian.go...@oracle.com>> wrote:
> I've updated the document on data classes here:
> 
>      http://cr.openjdk.java.net/~briangoetz/amber/datum.html 
> <http://cr.openjdk.java.net/~briangoetz/amber/datum.html>
> 
> (older versions of the document are retained in the same directory for 
> historical comparison.)
> 
> While the previous version was mostly about tradeoffs, this version 
> takes a much more opinionated interpretation of the feature, offering 
> more examples of use cases of where it is intended to be used (and not 
> used).  Many of the "under consideration" flexibilities (extension, 
> mutability, additional fields) have collapsed to their more restrictive 
> form; while some people will be disappointed because it doesn't solve 
> the worst of their boilerplate problems, our conclusion is: records are 
> a powerful feature, but they're not necessarily the delivery vehicle for 
> easing all the (often self-inflicted) pain of JavaBeans.  We can 
> continue to explore relief for these situations too as separate 
> features, but trying to be all things to all classes has delayed the 
> records train long enough, and I'm convince they're separate problems 
> that want separate solutions.  Time to let the records train roll.
> 
> I've also combined the information on sealed types in this document, as 
> the two are so tightly related.
> 
> Comments welcome.

Re: Updated document on data classes and sealed types

Reply via email to