Vincent, No need to send more info. I just reproduced it on Windows Server 2008 R2 Enterprise running the latest, 'non-business', but not supported as of Nov 2009, Sun JRE 1.5.0_22. The production servers are all running JDK6 and I could not test it on a Mac.
The bug is indeed related to the concurrent hash map iterators and is fixed on JDK6, see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6312056 I will update the patch and I hope a committer will check it. Alex Giotis On Mar 1, 2012, at 3:13 AM, Alexios Giotis wrote: > Hi Vincent, > > The unit tests where added by Mehdi but I could happily update them to Junit > 4 and create the separate methods. > > Related to the NullPointerException that you found, I tried many times but > unfortunately I can't reproduce them on my Macbook running MacOS X 10.7.3 > with any JRE and have not seen it in any heavily loaded server. What is the > environment that you tested it ? I could test it on different servers/OS or > try to write a test of ConcurrentHashMap that can trigger it. > > > Related to what is happening, this is tricky. The ConcurrentHashMap does not > allow putting null values, so one would not expect to get back null values. > But in the cleanReclaimedMapEntries() we are getting the values using an > iterator over the entrySet. As the javadoc says, the view's iterator are > "weakly consistent", never throw ConcurrentModificationException and they are > actually designed to be used by only one thread at a time. So this could > explain why a null value could be returned. > > Depending on how this is reproduced, we could either avoid this by just > checking for a null value or we could put the iteration in a block > synchronized on the map instance. This loop is done once per 10000 items > added, so it should not be slower. But before making this change, I need a > way to reproduce it so that we are sure about it. > > > Alex Giotis > > > > > On Feb 29, 2012, at 11:42 PM, Vincent Hennebert wrote: > >> Sorry for the delay about this. I had a look at the patch some time ago >> but didn’t have enough time to apply it. >> >> I noticed that a NPE occurs when adding the following piece of code to >> PropertyCacheTestCase and running it: >> >> private final PropertyCache<Integer> cache = new PropertyCache<Integer>(); >> >> private class CacheFiller implements Runnable { >> >> private final int start; >> >> CacheFiller(int start) { >> this.start = start; >> } >> >> public void run() { >> for (int i = 0; i < 1000000; i++) { >> cache.fetch(new Integer(start + i)); >> } >> } >> } >> >> public void testCleanUp() throws InterruptedException { >> Thread t1 = new Thread(new CacheFiller(0)); >> Thread t2 = new Thread(new CacheFiller(10000)); >> t1.start(); >> t2.start(); >> t1.join(); >> t2.join(); >> } >> >> Here’s the stacktrace: >> Exception in thread "Thread-1" java.lang.NullPointerException >> at >> org.apache.fop.fo.properties.PropertyCache.cleanReclaimedMapEntries(PropertyCache.java:153) >> at >> org.apache.fop.fo.properties.PropertyCache.fetch(PropertyCache.java:105) >> at >> org.apache.fop.fo.properties.PropertyCacheTestCase$CacheFiller.run(PropertyCacheTestCase.java:68) >> at java.lang.Thread.run(Thread.java:595) >> >> It happens when running the test with a Sun 1.5 JRE but apparently not >> with OpenJDK 1.6. I’m still unsure about what’s happening. >> >> It would be good to update the test cases to JUnit 4. Also, there are >> some clean-ups and improvements to bring. Among other things, symmetry >> is not properly tested in FObjectTest (symmetric.equals(getSut()) should >> also be run); the tests for reflexivity, symmetry and transitivity >> should be put in separate methods; testFailureCases is misnamed; etc. >> Little things but that take some time to fix. >> >> Vincent >> >> >> On 28/02/12 17:09, mehdi houshmand wrote: >>> Hi Guys, >>> >>> My apologies for the lack of transparency on this issue, but I didn't >>> actually review the changes you made here, in fact, I barely looked at >>> what PropertyCache actually does. I had some free time, and added a >>> bunch of unit tests. >>> >>> The reason this hasn't been committed yet was because Vincent said he >>> had some questions about the patch. That's as far as I know, maybe he >>> could give some feedback on the issue. >>> >>> Let me reiterate my apologies again on this, it's not fair that this >>> has been ignored. I'll endeavour to make the process more transparent >>> in future, I hope this doesn't prevent you or any other contributors >>> from submitting patches. >>> >>> Mehdi >>> >>> >>> On 28 February 2012 16:52, Glenn Adams <[email protected]> wrote: >>>> I support committing this patch, however I don't see an ICLA listed at [1] >>>> for Alexios. Alexios, if you have not submitted an ICLA [2], please do so. >>>> >>>> I would be happy to apply the patch (if Mehdi doesn't have the time). >>>> >>>> [1] http://people.apache.org/committer-index.html#unlistedclas >>>> [2] http://www.apache.org/licenses/icla.txt >>>> >>>> >>>> On Tue, Feb 28, 2012 at 6:27 AM, Alexios Giotis <[email protected]> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> About 6 months ago, I had a deadlock issue that regularly stopped >>>>> production servers. While I was opening a bugzilla ticket, I found that >>>>> this >>>>> was already reported back in 2009. This issue is still opened as it was >>>>> difficult to reproduce. On that issue, I added: >>>>> >>>>> [1] An explanation of why a deadlock is possible. >>>>> [1] Stacktraces of deadlocked threads from a production server. >>>>> [2] A small unit test that adds a Thread.sleep() to the PropertyCache to >>>>> make it always reproducable. >>>>> [3] A patch solving this issue. >>>>> [4] Explanations of why the patch rewrites the existing PropertyCache >>>>> class. >>>>> >>>>> This was then reviewed and unit tests were added [5]. On top of this, I >>>>> have committed the fix in my private branch and it works well on several >>>>> big >>>>> production systems. This is as far as I can go before a FOP committer >>>>> takes >>>>> it further. I am writing this because: >>>>> >>>>> - Deadlocks should be fixed. When they occur, there is no way around them. >>>>> - The trunk is moving, the patch is aging and it will be more difficult to >>>>> apply it over time. >>>>> - It is discouraging for submitting more patches. >>>>> >>>>> >>>>> Alexios Giotis >>>>> >>>>> >>>>> >>>>> >>>>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c3 >>>>> [2] https://issues.apache.org/bugzilla/attachment.cgi?id=27342 >>>>> [3] https://issues.apache.org/bugzilla/attachment.cgi?id=27477&action=diff >>>>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c7 >>>>> [5] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c9 >>>>> >>>> >
