Cole Greer created TINKERPOP-3071:
-------------------------------------
Summary: barrier() reliant on undefined HashSet behavior
Key: TINKERPOP-3071
URL: https://issues.apache.org/jira/browse/TINKERPOP-3071
Project: TinkerPop
Issue Type: Improvement
Components: process
Affects Versions: 3.7.2, 3.6.7
Reporter: Cole Greer
Certain traversals can force the barrier step into relying on undefined
behavior in Java. The barrier step operates by storing all incoming traversers
into a HashSet, and only once the input is exhausted, will it release
traversers to the next step. The contract of HashSet requires that users do not
modify objects in any way which will change their hash code while it is stored
in the HashSet. It is undefined how a HashSet will behave if the hash codes of
a contained object changes.
Some very specific types of traversals can force barrier() into such behavior.
Consider the following traversal:
{code:java}
g.V().as("label").aggregate(local,"x").select("x").barrier().select("label")
{code}
The aggregate step will attach a BulkSet to each traverser, as "x". It is key
here that each traverser is given a reference to the same BulkSet. The
execution of this traverser will go as follows:
1. The first traverser moves through the traversal, is given the BulkSet
(currently size 1) as "x"
2. The first traverser is stored in the barrier. At this time,
hashcode(Traverser1) = hashcode(BulkSetSize1).
3. The second traverser moves through the traversal, as it passes through the
aggregate step, it is added to the BulkSet, the BulkSet now has a size of 2.
4. The hashcode of Traverser1 has now changed. Now, hashcode(Traverser1) =
hashcode(BulkSetSize2).
What happens next will depend on the version of Java being used.
Currently in Java 8, the HashSet can be iterated and Traverser1 can be read,
but it can never be deleted as any attempt to find Traverser1 in the set by
it's current hash code will fail. The result of this is that the iterator will
get stuck on the first element. Repeated calls to .next() on the HashSet
iterator will always return Traverser1. This causes the traversal to get stuck
in a loop until it runs out of resources.
In Java 9 and up, HashSet has been
[patched|https://bugs.openjdk.org/browse/JDK-8170733] such that elements with
changed hash codes can be deleted, which allows the above traversal to run as
expected. We are still using HashSet wrong, but we aren't getting punished for
it in Java 9+.
There isn't a clear obvious solution for such traversals. The fix with the
lowest impact that I'm currently aware of would be for barrier step to "box"
each incoming traverser in some wrapper which computes the hash code of the
internal object once, and stores it. Successive calls to hashcode() on the box,
will always return the cached value regardless of if the internal object has
been modified. This would prevent the hashcode from changing (from the
perspective of the HashSet) while the traversers are being stored in the
barrier.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)