Re: Ephemerons

Peter Levart Sun, 24 Jan 2016 12:23:49 -0800

Hi Gil,

The algorithm complexity has been on my mind too and I've been thinkingof how to maintain a key (or it's address) -> ephemeron mapping mostefficiently. As I have written in a reply to Mandy, if we say thatindividual phase2 of processing of the normal References has complexityO(N) where N is the number of discovered references, then the equivalentphase2 processing of Ephemerons has comparable complexity O(N*d) where Nis the number of discovered Ephemerons and d is the maximum number of"hops" in object graph where a "hop" is defined as a point in chainwhere the value of some ephemeron refers directly or indirectly to thekey of some other ephemeron and the value of that other ephemeroncontinues the chain. Let's hope that in practical data structures, d isnot large. The maximum value of d is N, as in one of the tests. Thealgorithm in the prototype tries to reduce the number of outeriterations to much less than 'd' in many situations by alternating thedirection of scanning in each pass. The value(ephemeronX) ->key(ephemeronY) links could be arranged in a zig-zag pattern relative tothe order of discovered references in the list though which would be theworst case for the algorithm, but I think the chances of that happeningin a real-world data structure are low. But we have to think of possibledenial-of-service attacks too, so I agree that worst-case scenario hasto be improved.

Presently, Reference objects are discovered and hooked on single-linked_discoveredXXX lists (using Reference.discovered field to point to thenext in list). Each reference type has it's own set of lists. Multiplelists in a set enable parallel (concurrent?) discovery withoutcontention (a list per thread).

So I was thinking that one possibility would be for ephemerons to have abigger table of discovered lists - not just one per discovery thread,but quite bigger and that the index of a list to which discoveredephemeron is added would be computed from the ephemeron's key. Aclassical hash-table with buckets. Big enough table would minimizecontention when discovery is parallelized and enable the followingalgorithm (which could also be parallelized):

Let each bucket (each head of the list) have 2 boolean flags: include &pending.At the start, set include flags of all buckets to true and then enterthe loop:

do {
    reset pending flags of all buckets to false;
    for each bucket with (include == true), do {

scan the ephemerons and for those with live keys, mark valueand transitive closure as live.while marking the value and transitive closure as live, foreach object that was newly marked alive,compute the index into the ephemeron buckets as though suchobject was ephemeron's key

        and set the pending flag of that bucket.
    }

set the include flags of all buckets from the pending flags of thebuckets and count # of pending buckets

} while (# of pending buckets > 0);


Hm, ....


Regards, Peter

On 01/24/2016 06:43 PM, Gil Tene wrote:

A note on implementation possibilities:
If I read the implementation correctly, a "weakness" of the currentimplementation approach for making sure value-referents (and theirtransitively reachable) objects are kept alive if key referents arealive is that it requires multiple passes through the discoveredEphemeron list, with the passes terminating when the list stabilizes.While I think that this is sound (in the sense that it will work), itcarries a potentially high cost when large sets of Ephemerons exist inthe heap. E.g. if the Ephemerons are linked in a k-v list (where thevalue referent of one ephemeron is the key of another, in a chain), asin your code example, there is an N^2 scanning thing going on. Ande.g. if a large set of ephemeron keys become weakly reachable in asingle cycle (e.g. a large cache was discarded) while other ephemeronparticipate in some linked list relationship, the entire list of[stably] weak-keyed ephemerons has to be traversed in each pass (incase one of them has become live in a previous pass). I'd worry thatthese computational complexity issues could become prohibitive enoughin GC cost that you there would be significant resistance to theiradoption.
Note that in comparison (to my understanding), current ref processingwork involved in GC handling soft/weak/final/phantom refs remainslinear to the number of refs, and does not have an O(N^2) component.
I believe that there is a relatively simple way to bring Ephemeronprocessing to O(N) by establishing reverse mapping during the scan(the below description assumes STW during the scan):
1. Start ref processing with no reverse mapping table established.
2. During ref processing, establish an EphemeronKeyReverseMappingtable (logically a Map<Address, List<Ephemerons>>) which would mapindividual heap addresses to ephemerons who's key referent points tothose addresses.
3. Note that since each heap address can show up in multiple keyreferents, the map needs to return a (potentially empty) list ofEphemerons who's keys refer to the address.
4. Specifically, starting with an empty list, and for each discoveredEphemeron, add a reverse-mapping entry to theEphemeronKeyReverseMapping, mapping from the key referent address tothe Ephemeron.
5. During Ephemeron processing (under the case where the referent isfound to be alive and the ephemeron then needs to keep the valuereferent alive) mark down the value referent path using aspecial ephemeron_keep_alive OopClosure (or a mode flag that affectsthe normal keep_alive behavior) which, when reaching anot-yet-marked-live object [in addition to marking it live andtraversing it as keep_alive would normally do] would look up theobject's address in the EphemeronKeyReverseMapping to get a list ofEphemerons to traverse, and traverse each of the mapped Ephemerons'value referent with the same ephemeron_keep_alive closure.Note: doing reverse-mapping lookups on each not-yet-marked object in akeep_alive closure will add cost, which is whythis ephemeron_keep_alive pass should probably be done after regularkeep_alive passes have had their chance to mark objects live. This wayonly paths that become newly-live via ephemeron processing are subjectto the extra reverse-mapping-lookip cost.
While I haven't been poking at this too long to see if it has holes, Ithink it can produce a reliable result, and is O(N) on the count ofEphemerons.
— Gil.
On Jan 24, 2016, at 2:52 AM, Peter Levart <[email protected]<mailto:[email protected]>> wrote:
Hi Gil,
I totally agree with your assessment. We should not introduce anotherway of reviving the almost collectable objects and I fully supporttightening the specification so that soft and weak references to thesame referent and to other referents from which this referent isreachable are required to be cleared together atomically.
I modified the prototype to (hopefully) adhere to this new Ephemeronspecification that Gil and I agreed upon. Anyone interested inexperimenting can find it here:
http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.jdk.02/
http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.hotspot.02/
It is rebased to current tip of jdk9-dev repositories (after the bulkof merges for jdk-9+102), but still contains the change to remove theCleaner reference type as it has not yet managed to get in...
I have also added a test that is a start for verifying the functionality.

Regards, Peter

On 01/23/2016 07:25 PM, Gil Tene wrote:
On Jan 23, 2016, at 5:14 AM, Peter Levart <[email protected]>wrote:
Hi Gil, it's good to have this discussion. See comments inline...

On 01/23/2016 05:13 AM, Gil Tene wrote:
....
On Jan 22, 2016, at 2:49 PM, Peter Levart<[email protected]> wrote:
Ephemeron always touches definitions of at least two consecutivestrengths of reachabilities. The prototype says:
 * <li> An object is weakly reachable if it is neither
 * strongly nor softly reachable but can be reached by traversing a
* weak reference or by traversing an ephemeron through it'svalue while
 * the ephemeron's key is at least weakly reachable.

 * <li> An object is ephemerally reachable if it is neither
* strongly, softly nor weakly reachable but can be reached bytraversing an* ephemeron through it's key or by traversing an ephemeronthrough it's value* while it's key is at most ephemerally reachable. When theephemerons that* refer to ephemerally reachable key object are cleared, the keyobject becomes
 * eligible for finalization.
Looking into this a bit more, I don't think the above is quiteright. Specifically, If an ephemeron's key is either strongly ofsoftly reachable, you want the value to remain appropriatelystrongly/softly reachable. Without this quality, Ephemeron valuereferents can (and will) be prematurely collected and finalizedwhile the keys are not. This (IMO) needed quality not provided bythe behavior you specify…
This is not quite true. While ephemeron's value is weakly or evenephemerally-reachable, it is not finalizable, becauseephemeraly-reachable is stronger than finaly-reachable. Afterephemeron's key becomes ephemeraly-reachable, the ephemeron iscleared by GC which sets it's key *and* value to null atomically.The life of key and value at that moment becomes untangled. Eitherof them can have a finalizer or not and both of them willeventually be collected if not revived by their finalize() methods.But it can never happen that ephemeron's value is finalized orcollected while it's key is still reachable through the ephemeron(while the ephemeron is not cleared yet).
But I agree that it would be desirable for ephemeron's value tofollow the reachability of it's key. In above specification, if thekey is strongly reachable, the value is weakly reachable, so anyWeakReferences or SoftReferences pointing at the Ephemeron's valuecan already be cleared while the key is still strongly reachable.This is arguably no different than current specification of Softvs. Weak references. A SoftReference can already be cleared whileits referent is still reachable through a WeakReference,
We seem to agree about the cleaner behavior specification (in bothof our texts below), so the these next paragraphs are really aboutarguing for why this is an important design choice if/when addingEphemerons to Java:
It is true the [current] spec allows for soft references to anobject to be cleared while weak references to the same object arenot: the "determines" in "Suppose that the garbage collectordetermines at a certain point in time hat an object is RRRRreachable..." part [for RRRR = {soft, weak}] does not have to happenat the same "certain point in time".
However, to my knowledge all current implementations present as ifthis determination is happening at the same "point in time" for allweakly and softly reachable objects combined. Specifically [inimplementations]: if soft reachability is determined for an objectat some point in time, then weak reachability for that object isdetermined at the same point in time. And the weak reachabilitydetermination for an object depends on whether the collector choseto clear existing soft references to that object at that same pointin time, with the appearance of the choice to clear (or not toclear) soft references to a given object atomically affecting thedetermination of it's weak reachability. Since the collector is*required* to act on a weak determination when it is made, while it*may* act on a soft determination when it is made, making thecombined determination at the same "point in time" eliminates anobviously confusing situation that is not prohibited by the spec: ifthe determination for weak and soft reachability was not done at thesame point in time, then an object that was softly reachable and hadit's soft references cleared and queued could later become stronglyreachable, and even softly reachable again. When referenceprocessing is done as a STW thing, this "combined determination"effect is a trivial side-effect of STW. When it is done concurrently(or incrementally?), implementations still work to maintain theappearance of combined atomic determination of soft and weakreachability. I know ours does. In our case, we do it because we hadno desire to be the ones to argue "I know that all implementationsdid this atomically because they were STW, but the spec allows us toadd this bug to your program…".
So in actual implementations (to my knowledge), finalization iscurrently the only mechanism that can create this "strangesituation" where an object was no longer strongly reachable, hadactions triggered as a result from loss of strong reachability (i.e.actually observed by the program as "known to not be stronglyreachable"), and later became strongly reachable again. E.g. afinalizer can propagate a strong reference to a previouslynon-strongly reachable object ('this' in the finalizer, or anythingthat 'this' transitively refers and was not otherwise reachable whenthe finalizer was called).. This is one of those "undesired" thingsthat the introduction of Reference types was meant to deal with(Reference types were introduced in 1.2, after finalization wasunfortunately already included and spec'ed. And phantom refs weremeant to allow for a cleaner form that could replace finalization).And while the specifications of SoftReference and WeakReference donot prohibit it, implementations are not required to allow it, andin practice non of them do (I think), as doing so would most likelyexpose some "interesting" spec-allowed-but-extremely-surprisingthings/bugs that none of us want to have to defend...
In this context, it would be a "highly undesirable" design choice tointroduce Ephemerons in a way that would them to return a strongreference to an object that has previously been determined to nolonger be strongly reachable. Structuring the spec to prohibit thisis a better design choice.
To highlight the design choice here, let me describe a specificproblem scenario for which the previous (above) spec would cause"re-strengthening" behavior that would break assumptions that areallowed under the current spec: in the above/previously specifiedbehavior an object V that is known to have no finalizers, but hase.g. 3 WeakReference objects that refer to it, can become weaklyreachable while both a key referent object K in some ephemeron Ewith a value referent of V remain strongly reachable. At such apoint (V is weakly reachable, K and E are strongly reachable), thecollector may determine weak reachability for V, [atomically] clearall weak references to V, and enqueue those weak reference objectson their respective queues. While V is still ephemerally reachableunder your previous definition, there are no references to itanywhere other than in ephemeron value referent fields, and weakreferences that did refer to it have been cleared and queued. Sincethe ephemeron is still there, and the key is still there, and theephemeron has not been cleared, an Ephemeron.getValue() call wouldcreate a strong reference to an object that was previouslydetermined to not be weakly reachable. Re-creating a strongreference to V after the point where weak references to V werecleared and the weak refs to it were enqueued would be "surprising"to current weak reference based code (the only thing that couldcause this under the current spec would be a finalizer), so allowingthat (jn the spec) is likely to break all kinds of logic thatdepends on currently spec'ed weak reference behaviors.
The spec'ed behavior we seem to be agreeing on (below) wouldprohibit this loophole and would [I think] maintain anyreachability-based expectations that current weak-ref based logiccan make under the current spec. Maintaining this continuity is animportant design choice for adding Ephemerons into the current setof Reference behaviors.
And since I suspect that all implementations will continue to chooseto do the "determination" of soft and weak reachability at the same"point in time", this will fit well with how people would build thisstuff anyway.
Separate note: It would be separately interesting to considernarrowing the SoftRef spec to require JVM implementations toatomically clear all soft *and* weak references to an object at thesame time. I.e. if the garbage collector chooses to clear a softreference to an object that would become weakly reachable as aresult, then all weak references to that object must be [atomically]cleared at the same time. Since I suspect that all current JVMimplementations actually adhere to this stronger requirementalready, this would not "hurt" anything or require extra work tocomply with. [Anyone from Metronome or some other non-STW referenceprocessing implementations want to chime in?].
but for Ephemeron's value this might be confusing. The easier tounderstand conceptual model for Ephemerons might be a pair of(WeakReference<K>, WeakReference<V>) where the key has a virtualstrong reference to the value. And this is what we get if we saythat reachability of the value follows reachability of the key.
For a correctly specified behavior, I think all strengths (fromstrong down) need to be affected by key/value Ephemeronrelationships, but without adding an "ephemerally reachable"strength. E.g. I think you fundamentally need something like this:
- "An object is strongly reachable if it can be reachedby (a) some thread without traversing any reference objects, or by(b) traversing the value of an Ephemeron whose key is stronglyreachable. A newly-created object is strongly reachable by thethread that created it"
- "An object is softly reachable if it is notstrongly reachable but can be reached by (a) traversing a softreference or by (b) traversing the value of an Ephemeron whose keyis softly reachable.
- "An object is weakly reachable if it is neitherstrongly nor softly reachable but can be reached by (a) traversinga weak reference or by (b) traversing the value of an ephemeronwhose key is weakly reachable.
...and that's where we stop, because when we make Ephemeron just aspecial kind of WeakReference, the next thing that happens is:
* Suppose that the garbage collector determines at a certainpoint in time
 * that an object is <a href="package-summary.html#reachability">weakly
* reachable</a>. At that time it will atomically clear all weakreferences to* that object and all weak references to any otherweakly-reachable objects* from which that object is reachable through a chain of strongand soft
 * references. At the same time it will declare all of the formerly
* weakly-reachable objects to be finalizable. At the same time orat some* later time it will enqueue those newly-cleared weak referencesthat are
 * registered with reference queues.
...where "clearing of the WeakReference" means reseting the key*and* value to null in case it is an Ephemeron; and"all weak references to some object" means Ephemerons that havethat object as a key (but not those that only have it as a value!)in case of ephemerons
...
I still think that Ephemeron<K, V> should extend WeakReference<K>,since that places already established rules and expectation on (a)when it will be enqueued, (b) when the collector will clear it(when the the collector encounters the <K> key being weaklyreachable), and (c) that clearing of all Ephemeron *and*WeakReference instances who share an identical key value is doneatomically, along with (d) all weak references to to any otherweakly-reachable objects from which that object is reachablethrough a chain of strong and soft references. These last (c, d)parts are critically captured since an Ephemeron *is a*WeakReference, and the statement in WeakReference that says that"… it will atomically clear all weak references to that object andall weak references to any other weakly-reachable objects fromwhich that object is reachable through a chain of strong and softreferences." has a clear application.
Here are some suggested edits to the JavaDoc to go with thissuggested spec'ed behavior:
/**
* Ephemeron<K, V> objects are a special kind of WeakReference<K>objects, which* hold two referents (a key referent and a value referent) anddo not prevent their* referents from being made finalizable, finalized, and thenreclaimed.* In addition to the key referent, which adheres to the referentbehavior of a* WeakReference<K>, an ephemeron also holds a value referentwhose reachabiliy* strength is affected by the reachability strength of the keyreferent:
 * The value referent of an Ephemeron instance is considered:
 * (a) strongly reachable if the key referent of the same Ephemeron
* object is strongly reachable, or if the value referent isotherwise strongly reachable.* (b) softly reachable if it is not strongly reachable, and (i)the key referent of* the same Ephemeron object is softly reachable, or (ii) if thevalue referent is otherwise
 * softly reachable.
* (c) weakly reachable if it is not strongly or softlyreachable, and (i) the key referent of* the same Ephemeron object is weakly reachable, or (ii) if thevalue referent is otherwise
 * weakly reachable.
* When the collector clears an Ephemeron object instance(according to the rules* expressed for clearing WeakReference object instances), theEphemeron instance's* key referent value referent are simultaneously and atomicallycleared.* By convenience, the Ephemeron's referent is also calledthe key, and can be* obtained either by invoking {@link #get} or {@link #getKey}while the value
 * can be obtained by invoking {@link #getValue} method.
 *...
Thanks, this is very nice. I do like this behavior more.

Let me see what it takes to implement this strategy...

Regards, Peter

Re: Ephemerons

Reply via email to