On 5/06/2013 3:21 AM, Gregg Wonderly wrote:
Did you use a concurrent Set implementation?
On this occassion I used ConcurrentSkipListSet, however a synchronized
HashSet might be more appropriate. The HashSet would be faster, but
synchronized, while the ConcurrentSkipListSet would take longer but is
non blocking. When it triggered the bug, I left it with the
ConcurrentSkipListSet. Now the bug's solved, I'll probably leave it as
it is, but it didn't appear to be creating any issues with Hashtable and
Vector, I almost didn't replace it, but now I'm glad I did.
The concurrent code is actually simpler to understand and easier to read
on this occassion.
The "synchronized" nature of "Hashtable" and "Vector" will,
unfortunately, "fix" a number of concurrency issues by causing
constant cache line updates to occur. This of course is a huge impact
to non-concurrent code, and hence "HashMap", "HashSet" and "ArrayList"
make things go a lot faster there.
Perhaps your change in classes inadvertently removed synchronization
and thus Happens Before which was keeping the JIT out of the mix?
Quite possibly, although I'd seen the failure at least once previously
on Arm, but assumed it was fixed because it hadn't reappeared. I guess
it was still there, just lurking.
Gregg Wonderly
On 6/3/2013 5:13 PM, Peter Firmstone wrote:
Found a beaut bug, this time it relates to
com.sun.jini.outrigger.EntryRep, this is what I think's occurring on
the client side.
During construction arrays were created, written to volatile
variables, then populated with values.
Now EntryRep uses default serialization, it isn't synchronized if
marshalled by a different thread, and an EntryRep is created for
every Entry written into the space.
Previously I'd only seen similar test failures on Arm, but now I
could observe it on Windows, the platform so far least affected by
concurrency issues.
com/sun/jini/test/impl/outrigger/leasing/UseNotifyLeaseTest.td
How did I find it?
An unrelated class com.sun.jini.outrigger.TypeTree used the data
structure, Hashtree<String,Vector<String>> internally, to cache all
subclasses, I replaced the data structure with
ConcurrentMap<String,Set<String>>, which simplified the code
somewhat, this also allowed the unrelated EntryRep to fail on
Window's where previously it wasn't evident.
The good news is, it even failed while being observed with visualvm.
It appears that the test was running well until hotspot optimised
reflective method invocation, after that, the EntryRep array contents
went missing and the test subsequently failed because the Watchers no
longer matched the EntryRep and didn't send any more event
notifications.
I've just committed the fix, feel free to reverse the changes to
EntryRep and play around with unsafe publication.
Anyone seen any strange behaviour writing Entry's to the space in
deployment? Eg entry's going missing, not matching, or update
notifications not occurring? It's likely this could have been
confused with network failure, which the Jini infrastructure handles
quite well.
Regards,
Peter.