On 5/06/2013 3:21 AM, Gregg Wonderly wrote:
Did you use  a concurrent Set implementation?

On this occassion I used ConcurrentSkipListSet, however a synchronized HashSet might be more appropriate. The HashSet would be faster, but synchronized, while the ConcurrentSkipListSet would take longer but is non blocking. When it triggered the bug, I left it with the ConcurrentSkipListSet. Now the bug's solved, I'll probably leave it as it is, but it didn't appear to be creating any issues with Hashtable and Vector, I almost didn't replace it, but now I'm glad I did.

The concurrent code is actually simpler to understand and easier to read on this occassion.

The "synchronized" nature of "Hashtable" and "Vector" will, unfortunately, "fix" a number of concurrency issues by causing constant cache line updates to occur. This of course is a huge impact to non-concurrent code, and hence "HashMap", "HashSet" and "ArrayList" make things go a lot faster there.

Perhaps your change in classes inadvertently removed synchronization and thus Happens Before which was keeping the JIT out of the mix?

Quite possibly, although I'd seen the failure at least once previously on Arm, but assumed it was fixed because it hadn't reappeared. I guess it was still there, just lurking.


Gregg Wonderly

On 6/3/2013 5:13 PM, Peter Firmstone wrote:
Found a beaut bug, this time it relates to com.sun.jini.outrigger.EntryRep, this is what I think's occurring on the client side.

During construction arrays were created, written to volatile variables, then populated with values.

Now EntryRep uses default serialization, it isn't synchronized if marshalled by a different thread, and an EntryRep is created for every Entry written into the space.

Previously I'd only seen similar test failures on Arm, but now I could observe it on Windows, the platform so far least affected by concurrency issues.

com/sun/jini/test/impl/outrigger/leasing/UseNotifyLeaseTest.td

How did I find it?

An unrelated class com.sun.jini.outrigger.TypeTree used the data structure, Hashtree<String,Vector<String>> internally, to cache all subclasses, I replaced the data structure with ConcurrentMap<String,Set<String>>, which simplified the code somewhat, this also allowed the unrelated EntryRep to fail on Window's where previously it wasn't evident.

The good news is, it even failed while being observed with visualvm. It appears that the test was running well until hotspot optimised reflective method invocation, after that, the EntryRep array contents went missing and the test subsequently failed because the Watchers no longer matched the EntryRep and didn't send any more event notifications.

I've just committed the fix, feel free to reverse the changes to EntryRep and play around with unsafe publication.

Anyone seen any strange behaviour writing Entry's to the space in deployment? Eg entry's going missing, not matching, or update notifications not occurring? It's likely this could have been confused with network failure, which the Jini infrastructure handles quite well.

Regards,

Peter.







Reply via email to