Hi Gil,
The algorithm complexity has been on my mind too and I've been thinking
of how to maintain a key (or it's address) -> ephemeron mapping most
efficiently. As I have written in a reply to Mandy, if we say that
individual phase2 of processing of the normal References has complexity
O(N) where N is the number of discovered references, then the equivalent
phase2 processing of Ephemerons has comparable complexity O(N*d) where N
is the number of discovered Ephemerons and d is the maximum number of
"hops" in object graph where a "hop" is defined as a point in chain
where the value of some ephemeron refers directly or indirectly to the
key of some other ephemeron and the value of that other ephemeron
continues the chain. Let's hope that in practical data structures, d is
not large. The maximum value of d is N, as in one of the tests. The
algorithm in the prototype tries to reduce the number of outer
iterations to much less than 'd' in many situations by alternating the
direction of scanning in each pass. The value(ephemeronX) ->
key(ephemeronY) links could be arranged in a zig-zag pattern relative to
the order of discovered references in the list though which would be the
worst case for the algorithm, but I think the chances of that happening
in a real-world data structure are low. But we have to think of possible
denial-of-service attacks too, so I agree that worst-case scenario has
to be improved.
Presently, Reference objects are discovered and hooked on single-linked
_discoveredXXX lists (using Reference.discovered field to point to the
next in list). Each reference type has it's own set of lists. Multiple
lists in a set enable parallel (concurrent?) discovery without
contention (a list per thread).
So I was thinking that one possibility would be for ephemerons to have a
bigger table of discovered lists - not just one per discovery thread,
but quite bigger and that the index of a list to which discovered
ephemeron is added would be computed from the ephemeron's key. A
classical hash-table with buckets. Big enough table would minimize
contention when discovery is parallelized and enable the following
algorithm (which could also be parallelized):
Let each bucket (each head of the list) have 2 boolean flags: include &
pending.
At the start, set include flags of all buckets to true and then enter
the loop:
do {
reset pending flags of all buckets to false;
for each bucket with (include == true), do {
scan the ephemerons and for those with live keys, mark value
and transitive closure as live.
while marking the value and transitive closure as live, for
each object that was newly marked alive,
compute the index into the ephemeron buckets as though such
object was ephemeron's key
and set the pending flag of that bucket.
}
set the include flags of all buckets from the pending flags of the
buckets and count # of pending buckets
} while (# of pending buckets > 0);
Hm, ....
Regards, Peter
On 01/24/2016 06:43 PM, Gil Tene wrote:
A note on implementation possibilities:
If I read the implementation correctly, a "weakness" of the current
implementation approach for making sure value-referents (and their
transitively reachable) objects are kept alive if key referents are
alive is that it requires multiple passes through the discovered
Ephemeron list, with the passes terminating when the list stabilizes.
While I think that this is sound (in the sense that it will work), it
carries a potentially high cost when large sets of Ephemerons exist in
the heap. E.g. if the Ephemerons are linked in a k-v list (where the
value referent of one ephemeron is the key of another, in a chain), as
in your code example, there is an N^2 scanning thing going on. And
e.g. if a large set of ephemeron keys become weakly reachable in a
single cycle (e.g. a large cache was discarded) while other ephemeron
participate in some linked list relationship, the entire list of
[stably] weak-keyed ephemerons has to be traversed in each pass (in
case one of them has become live in a previous pass). I'd worry that
these computational complexity issues could become prohibitive enough
in GC cost that you there would be significant resistance to their
adoption.
Note that in comparison (to my understanding), current ref processing
work involved in GC handling soft/weak/final/phantom refs remains
linear to the number of refs, and does not have an O(N^2) component.
I believe that there is a relatively simple way to bring Ephemeron
processing to O(N) by establishing reverse mapping during the scan
(the below description assumes STW during the scan):
1. Start ref processing with no reverse mapping table established.
2. During ref processing, establish an EphemeronKeyReverseMapping
table (logically a Map<Address, List<Ephemerons>>) which would map
individual heap addresses to ephemerons who's key referent points to
those addresses.
3. Note that since each heap address can show up in multiple key
referents, the map needs to return a (potentially empty) list of
Ephemerons who's keys refer to the address.
4. Specifically, starting with an empty list, and for each discovered
Ephemeron, add a reverse-mapping entry to the
EphemeronKeyReverseMapping, mapping from the key referent address to
the Ephemeron.
5. During Ephemeron processing (under the case where the referent is
found to be alive and the ephemeron then needs to keep the value
referent alive) mark down the value referent path using a
special ephemeron_keep_alive OopClosure (or a mode flag that affects
the normal keep_alive behavior) which, when reaching a
not-yet-marked-live object [in addition to marking it live and
traversing it as keep_alive would normally do] would look up the
object's address in the EphemeronKeyReverseMapping to get a list of
Ephemerons to traverse, and traverse each of the mapped Ephemerons'
value referent with the same ephemeron_keep_alive closure.
Note: doing reverse-mapping lookups on each not-yet-marked object in a
keep_alive closure will add cost, which is why
this ephemeron_keep_alive pass should probably be done after regular
keep_alive passes have had their chance to mark objects live. This way
only paths that become newly-live via ephemeron processing are subject
to the extra reverse-mapping-lookip cost.
While I haven't been poking at this too long to see if it has holes, I
think it can produce a reliable result, and is O(N) on the count of
Ephemerons.
— Gil.
On Jan 24, 2016, at 2:52 AM, Peter Levart <peter.lev...@gmail.com
<mailto:peter.lev...@gmail.com>> wrote:
Hi Gil,
I totally agree with your assessment. We should not introduce another
way of reviving the almost collectable objects and I fully support
tightening the specification so that soft and weak references to the
same referent and to other referents from which this referent is
reachable are required to be cleared together atomically.
I modified the prototype to (hopefully) adhere to this new Ephemeron
specification that Gil and I agreed upon. Anyone interested in
experimenting can find it here:
http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.jdk.02/
http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.hotspot.02/
It is rebased to current tip of jdk9-dev repositories (after the bulk
of merges for jdk-9+102), but still contains the change to remove the
Cleaner reference type as it has not yet managed to get in...
I have also added a test that is a start for verifying the functionality.
Regards, Peter
On 01/23/2016 07:25 PM, Gil Tene wrote:
On Jan 23, 2016, at 5:14 AM, Peter Levart <peter.lev...@gmail.com>
wrote:
Hi Gil, it's good to have this discussion. See comments inline...
On 01/23/2016 05:13 AM, Gil Tene wrote:
....
On Jan 22, 2016, at 2:49 PM, Peter Levart
<peter.lev...@gmail.com> wrote:
Ephemeron always touches definitions of at least two consecutive
strengths of reachabilities. The prototype says:
* <li> An object is <em>weakly reachable</em> if it is neither
* strongly nor softly reachable but can be reached by traversing a
* weak reference or by traversing an ephemeron through it's
value while
* the ephemeron's key is at least weakly reachable.
* <li> An object is <em>ephemerally reachable</em> if it is neither
* strongly, softly nor weakly reachable but can be reached by
traversing an
* ephemeron through it's key or by traversing an ephemeron
through it's value
* while it's key is at most ephemerally reachable. When the
ephemerons that
* refer to ephemerally reachable key object are cleared, the key
object becomes
* eligible for finalization.
Looking into this a bit more, I don't think the above is quite
right. Specifically, If an ephemeron's key is either strongly of
softly reachable, you want the value to remain appropriately
strongly/softly reachable. Without this quality, Ephemeron value
referents can (and will) be prematurely collected and finalized
while the keys are not. This (IMO) needed quality not provided by
the behavior you specify…
This is not quite true. While ephemeron's value is weakly or even
ephemerally-reachable, it is not finalizable, because
ephemeraly-reachable is stronger than finaly-reachable. After
ephemeron's key becomes ephemeraly-reachable, the ephemeron is
cleared by GC which sets it's key *and* value to null atomically.
The life of key and value at that moment becomes untangled. Either
of them can have a finalizer or not and both of them will
eventually be collected if not revived by their finalize() methods.
But it can never happen that ephemeron's value is finalized or
collected while it's key is still reachable through the ephemeron
(while the ephemeron is not cleared yet).
But I agree that it would be desirable for ephemeron's value to
follow the reachability of it's key. In above specification, if the
key is strongly reachable, the value is weakly reachable, so any
WeakReferences or SoftReferences pointing at the Ephemeron's value
can already be cleared while the key is still strongly reachable.
This is arguably no different than current specification of Soft
vs. Weak references. A SoftReference can already be cleared while
its referent is still reachable through a WeakReference,
We seem to agree about the cleaner behavior specification (in both
of our texts below), so the these next paragraphs are really about
arguing for why this is an important design choice if/when adding
Ephemerons to Java:
It is true the [current] spec allows for soft references to an
object to be cleared while weak references to the same object are
not: the "determines" in "Suppose that the garbage collector
determines at a certain point in time hat an object is RRRR
reachable..." part [for RRRR = {soft, weak}] does not have to happen
at the same "certain point in time".
However, to my knowledge all current implementations present as if
this determination is happening at the same "point in time" for all
weakly and softly reachable objects combined. Specifically [in
implementations]: if soft reachability is determined for an object
at some point in time, then weak reachability for that object is
determined at the same point in time. And the weak reachability
determination for an object depends on whether the collector chose
to clear existing soft references to that object at that same point
in time, with the appearance of the choice to clear (or not to
clear) soft references to a given object atomically affecting the
determination of it's weak reachability. Since the collector is
*required* to act on a weak determination when it is made, while it
*may* act on a soft determination when it is made, making the
combined determination at the same "point in time" eliminates an
obviously confusing situation that is not prohibited by the spec: if
the determination for weak and soft reachability was not done at the
same point in time, then an object that was softly reachable and had
it's soft references cleared and queued could later become strongly
reachable, and even softly reachable again. When reference
processing is done as a STW thing, this "combined determination"
effect is a trivial side-effect of STW. When it is done concurrently
(or incrementally?), implementations still work to maintain the
appearance of combined atomic determination of soft and weak
reachability. I know ours does. In our case, we do it because we had
no desire to be the ones to argue "I know that all implementations
did this atomically because they were STW, but the spec allows us to
add this bug to your program…".
So in actual implementations (to my knowledge), finalization is
currently the only mechanism that can create this "strange
situation" where an object was no longer strongly reachable, had
actions triggered as a result from loss of strong reachability (i.e.
actually observed by the program as "known to not be strongly
reachable"), and later became strongly reachable again. E.g. a
finalizer can propagate a strong reference to a previously
non-strongly reachable object ('this' in the finalizer, or anything
that 'this' transitively refers and was not otherwise reachable when
the finalizer was called).. This is one of those "undesired" things
that the introduction of Reference types was meant to deal with
(Reference types were introduced in 1.2, after finalization was
unfortunately already included and spec'ed. And phantom refs were
meant to allow for a cleaner form that could replace finalization).
And while the specifications of SoftReference and WeakReference do
not prohibit it, implementations are not required to allow it, and
in practice non of them do (I think), as doing so would most likely
expose some "interesting" spec-allowed-but-extremely-surprising
things/bugs that none of us want to have to defend...
In this context, it would be a "highly undesirable" design choice to
introduce Ephemerons in a way that would them to return a strong
reference to an object that has previously been determined to no
longer be strongly reachable. Structuring the spec to prohibit this
is a better design choice.
To highlight the design choice here, let me describe a specific
problem scenario for which the previous (above) spec would cause
"re-strengthening" behavior that would break assumptions that are
allowed under the current spec: in the above/previously specified
behavior an object V that is known to have no finalizers, but has
e.g. 3 WeakReference objects that refer to it, can become weakly
reachable while both a key referent object K in some ephemeron E
with a value referent of V remain strongly reachable. At such a
point (V is weakly reachable, K and E are strongly reachable), the
collector may determine weak reachability for V, [atomically] clear
all weak references to V, and enqueue those weak reference objects
on their respective queues. While V is still ephemerally reachable
under your previous definition, there are no references to it
anywhere other than in ephemeron value referent fields, and weak
references that did refer to it have been cleared and queued. Since
the ephemeron is still there, and the key is still there, and the
ephemeron has not been cleared, an Ephemeron.getValue() call would
create a strong reference to an object that was previously
determined to not be weakly reachable. Re-creating a strong
reference to V after the point where weak references to V were
cleared and the weak refs to it were enqueued would be "surprising"
to current weak reference based code (the only thing that could
cause this under the current spec would be a finalizer), so allowing
that (jn the spec) is likely to break all kinds of logic that
depends on currently spec'ed weak reference behaviors.
The spec'ed behavior we seem to be agreeing on (below) would
prohibit this loophole and would [I think] maintain any
reachability-based expectations that current weak-ref based logic
can make under the current spec. Maintaining this continuity is an
important design choice for adding Ephemerons into the current set
of Reference behaviors.
And since I suspect that all implementations will continue to choose
to do the "determination" of soft and weak reachability at the same
"point in time", this will fit well with how people would build this
stuff anyway.
Separate note: It would be separately interesting to consider
narrowing the SoftRef spec to require JVM implementations to
atomically clear all soft *and* weak references to an object at the
same time. I.e. if the garbage collector chooses to clear a soft
reference to an object that would become weakly reachable as a
result, then all weak references to that object must be [atomically]
cleared at the same time. Since I suspect that all current JVM
implementations actually adhere to this stronger requirement
already, this would not "hurt" anything or require extra work to
comply with. [Anyone from Metronome or some other non-STW reference
processing implementations want to chime in?].
but for Ephemeron's value this might be confusing. The easier to
understand conceptual model for Ephemerons might be a pair of
(WeakReference<K>, WeakReference<V>) where the key has a virtual
strong reference to the value. And this is what we get if we say
that reachability of the value follows reachability of the key.
For a correctly specified behavior, I think all strengths (from
strong down) need to be affected by key/value Ephemeron
relationships, but without adding an "ephemerally reachable"
strength. E.g. I think you fundamentally need something like this:
- "An object is <em>strongly reachable</em> if it can be reached
by (a) some thread without traversing any reference objects, or by
(b) traversing the value of an Ephemeron whose key is strongly
reachable. A newly-created object is strongly reachable by the
thread that created it"
- "An object is <em>softly reachable</em> if it is not
strongly reachable but can be reached by (a) traversing a soft
reference or by (b) traversing the value of an Ephemeron whose key
is softly reachable.
- "An object is <em>weakly reachable</em> if it is neither
strongly nor softly reachable but can be reached by (a) traversing
a weak reference or by (b) traversing the value of an ephemeron
whose key is weakly reachable.
...and that's where we stop, because when we make Ephemeron just a
special kind of WeakReference, the next thing that happens is:
* <p> Suppose that the garbage collector determines at a certain
point in time
* that an object is <a href="package-summary.html#reachability">weakly
* reachable</a>. At that time it will atomically clear all weak
references to
* that object and all weak references to any other
weakly-reachable objects
* from which that object is reachable through a chain of strong
and soft
* references. At the same time it will declare all of the formerly
* weakly-reachable objects to be finalizable. At the same time or
at some
* later time it will enqueue those newly-cleared weak references
that are
* registered with reference queues.
...where "clearing of the WeakReference" means reseting the key
*and* value to null in case it is an Ephemeron; and
"all weak references to some object" means Ephemerons that have
that object as a key (but not those that only have it as a value!)
in case of ephemerons
...
I still think that Ephemeron<K, V> should extend WeakReference<K>,
since that places already established rules and expectation on (a)
when it will be enqueued, (b) when the collector will clear it
(when the the collector encounters the <K> key being weakly
reachable), and (c) that clearing of all Ephemeron *and*
WeakReference instances who share an identical key value is done
atomically, along with (d) all weak references to to any other
weakly-reachable objects from which that object is reachable
through a chain of strong and soft references. These last (c, d)
parts are critically captured since an Ephemeron *is a*
WeakReference, and the statement in WeakReference that says that
"… it will atomically clear all weak references to that object and
all weak references to any other weakly-reachable objects from
which that object is reachable through a chain of strong and soft
references." has a clear application.
Here are some suggested edits to the JavaDoc to go with this
suggested spec'ed behavior:
/**
* Ephemeron<K, V> objects are a special kind of WeakReference<K>
objects, which
* hold two referents (a key referent and a value referent) and
do not prevent their
* referents from being made finalizable, finalized, and then
reclaimed.
* In addition to the key referent, which adheres to the referent
behavior of a
* WeakReference<K>, an ephemeron also holds a value referent
whose reachabiliy
* strength is affected by the reachability strength of the key
referent:
* The value referent of an Ephemeron instance is considered:
* (a) strongly reachable if the key referent of the same Ephemeron
* object is strongly reachable, or if the value referent is
otherwise strongly reachable.
* (b) softly reachable if it is not strongly reachable, and (i)
the key referent of
* the same Ephemeron object is softly reachable, or (ii) if the
value referent is otherwise
* softly reachable.
* (c) weakly reachable if it is not strongly or softly
reachable, and (i) the key referent of
* the same Ephemeron object is weakly reachable, or (ii) if the
value referent is otherwise
* weakly reachable.
* <p> When the collector clears an Ephemeron object instance
(according to the rules
* expressed for clearing WeakReference object instances), the
Ephemeron instance's
* key referent value referent are simultaneously and atomically
cleared.
* <p> By convenience, the Ephemeron's referent is also called
the key, and can be
* obtained either by invoking {@link #get} or {@link #getKey}
while the value
* can be obtained by invoking {@link #getValue} method.
*...
Thanks, this is very nice. I do like this behavior more.
Let me see what it takes to implement this strategy...
Regards, Peter