On 4/8/19 1:40 PM, Aleksey Shipilev wrote:
On 4/8/19 1:28 PM, Peter Levart wrote:
The reasoning is very similar as with just one field. With one field (hash) the 
thread sees either
the default value (0) or a non-zero value calculated either by this thread 
sometime before or by a
concurrent thread that has already stored it. Regardless of ordering, the 
thread either uses the
non-zero value or (re)calculates it (again). The value calculation is 
deterministic and uses
immutable published state (the array), so it always calculates the same value 
for the same object.
Idempotence is guaranteed.

The same reasoning can be extended to a general case where there are many 
fields used for caching of
a calculated state from some immutable published state. The constraint is that 
the calculation must
be deterministic and must also deterministically choose which of the many 
fields used for caching is
to be modified. Only one field may be modified, never more than one. The thread 
therefore sees
either the default values of all fields or the default values of all but one 
field which has been
set by either this thread sometime before or by a concurrent thread. Regardless 
of ordering, the
thread either uses the state combined from the default values of all fields but 
one and a
non-default value of a single field or (re)calculates the non-default value of 
the single field. The
value calculation is deterministic, uses immutable published state and 
deterministically chooses the
field to modify, so it always calculates the same "next" state for the object. 
Idempotence is
guaranteed.
Thank you, the mere existence of this wall of text solidifies my argument: the 
need to invoke the
argument like that is exactly the cognitive complexity I've been talking about, 
and it speaks about
maintainability/risk cost, while benefits are still around the machine epsilon.

I tried to write the two descriptions side by side to show that the 2nd is not more complex than the 1st. It's just using longer "nouns". The sentences are otherwise equivalent and there's additional text that describes the "nouns". I could have done a better job though...

So here's 2nd try:

The String hash code caching (as it is written today) is an example of a benign data race that can be described as caching of lazily calculated state from immutable published state, both modeled in the same object. Data race is benign if:

- the published state which is used as input of the calculation is immutable
- the calculation is deterministic
- threads observe the cached calculated state of the object to be updated just once atomically. Meaning that there are only two different observable states of object: "initial" state where the calculated cached data is not set and "updated" state where the the calculated cached data is set.

Java fields up to 32 bits wide (+ reference fields regardless of width) exhibit atomic updates.

So if the update of the object state (transition from "initial" to "updated" state) is performed by a write of a deterministically calculated value to a single deterministically chosen field of no more than 32 bits (or a reference field), the whole object state is observed to change atomically and the data race is benign.

Current and proposed caching differ only in the number of fields used for caching the calculated state, but both adhere to the above rules.

So the reasoning stays the same as with current code. It only takes a little to realize that it's all about a single field that is updated while the presence of other fields (zero or more) don't change the picture since they are constant for the whole lifetime of object.

If you're afraid that a future maintainer of that code would not realize that, then a simple comment put into String.hashCode method and java_lang_String::set_hash C++ metohd that would say something like the following:

// only a single field may be modified so that the Object state is updated atomically

...is surely going to help him/her keep the String free from bugs...

Regards, Peter


Reply via email to