Patch updated, see https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c15
Alex Giotis On Mar 1, 2012, at 2:20 PM, Alexios Giotis wrote: > Vincent, > > No need to send more info. I just reproduced it on Windows Server 2008 R2 > Enterprise running the latest, 'non-business', but not supported as of Nov > 2009, Sun JRE 1.5.0_22. The production servers are all running JDK6 and I > could not test it on a Mac. > > The bug is indeed related to the concurrent hash map iterators and is fixed > on JDK6, see > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6312056 > > I will update the patch and I hope a committer will check it. > > Alex Giotis > > > > > On Mar 1, 2012, at 3:13 AM, Alexios Giotis wrote: > >> Hi Vincent, >> >> The unit tests where added by Mehdi but I could happily update them to Junit >> 4 and create the separate methods. >> >> Related to the NullPointerException that you found, I tried many times but >> unfortunately I can't reproduce them on my Macbook running MacOS X 10.7.3 >> with any JRE and have not seen it in any heavily loaded server. What is the >> environment that you tested it ? I could test it on different servers/OS or >> try to write a test of ConcurrentHashMap that can trigger it. >> >> >> Related to what is happening, this is tricky. The ConcurrentHashMap does not >> allow putting null values, so one would not expect to get back null values. >> But in the cleanReclaimedMapEntries() we are getting the values using an >> iterator over the entrySet. As the javadoc says, the view's iterator are >> "weakly consistent", never throw ConcurrentModificationException and they >> are actually designed to be used by only one thread at a time. So this could >> explain why a null value could be returned. >> >> Depending on how this is reproduced, we could either avoid this by just >> checking for a null value or we could put the iteration in a block >> synchronized on the map instance. This loop is done once per 10000 items >> added, so it should not be slower. But before making this change, I need a >> way to reproduce it so that we are sure about it. >> >> >> Alex Giotis >> >> >> >> >> On Feb 29, 2012, at 11:42 PM, Vincent Hennebert wrote: >> >>> Sorry for the delay about this. I had a look at the patch some time ago >>> but didn’t have enough time to apply it. >>> >>> I noticed that a NPE occurs when adding the following piece of code to >>> PropertyCacheTestCase and running it: >>> >>> private final PropertyCache<Integer> cache = new PropertyCache<Integer>(); >>> >>> private class CacheFiller implements Runnable { >>> >>> private final int start; >>> >>> CacheFiller(int start) { >>> this.start = start; >>> } >>> >>> public void run() { >>> for (int i = 0; i < 1000000; i++) { >>> cache.fetch(new Integer(start + i)); >>> } >>> } >>> } >>> >>> public void testCleanUp() throws InterruptedException { >>> Thread t1 = new Thread(new CacheFiller(0)); >>> Thread t2 = new Thread(new CacheFiller(10000)); >>> t1.start(); >>> t2.start(); >>> t1.join(); >>> t2.join(); >>> } >>> >>> Here’s the stacktrace: >>> Exception in thread "Thread-1" java.lang.NullPointerException >>> at >>> org.apache.fop.fo.properties.PropertyCache.cleanReclaimedMapEntries(PropertyCache.java:153) >>> at >>> org.apache.fop.fo.properties.PropertyCache.fetch(PropertyCache.java:105) >>> at >>> org.apache.fop.fo.properties.PropertyCacheTestCase$CacheFiller.run(PropertyCacheTestCase.java:68) >>> at java.lang.Thread.run(Thread.java:595) >>> >>> It happens when running the test with a Sun 1.5 JRE but apparently not >>> with OpenJDK 1.6. I’m still unsure about what’s happening. >>> >>> It would be good to update the test cases to JUnit 4. Also, there are >>> some clean-ups and improvements to bring. Among other things, symmetry >>> is not properly tested in FObjectTest (symmetric.equals(getSut()) should >>> also be run); the tests for reflexivity, symmetry and transitivity >>> should be put in separate methods; testFailureCases is misnamed; etc. >>> Little things but that take some time to fix. >>> >>> Vincent >>> >>> >>> On 28/02/12 17:09, mehdi houshmand wrote: >>>> Hi Guys, >>>> >>>> My apologies for the lack of transparency on this issue, but I didn't >>>> actually review the changes you made here, in fact, I barely looked at >>>> what PropertyCache actually does. I had some free time, and added a >>>> bunch of unit tests. >>>> >>>> The reason this hasn't been committed yet was because Vincent said he >>>> had some questions about the patch. That's as far as I know, maybe he >>>> could give some feedback on the issue. >>>> >>>> Let me reiterate my apologies again on this, it's not fair that this >>>> has been ignored. I'll endeavour to make the process more transparent >>>> in future, I hope this doesn't prevent you or any other contributors >>>> from submitting patches. >>>> >>>> Mehdi >>>> >>>> >>>> On 28 February 2012 16:52, Glenn Adams <[email protected]> wrote: >>>>> I support committing this patch, however I don't see an ICLA listed at [1] >>>>> for Alexios. Alexios, if you have not submitted an ICLA [2], please do so. >>>>> >>>>> I would be happy to apply the patch (if Mehdi doesn't have the time). >>>>> >>>>> [1] http://people.apache.org/committer-index.html#unlistedclas >>>>> [2] http://www.apache.org/licenses/icla.txt >>>>> >>>>> >>>>> On Tue, Feb 28, 2012 at 6:27 AM, Alexios Giotis <[email protected]> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> About 6 months ago, I had a deadlock issue that regularly stopped >>>>>> production servers. While I was opening a bugzilla ticket, I found that >>>>>> this >>>>>> was already reported back in 2009. This issue is still opened as it was >>>>>> difficult to reproduce. On that issue, I added: >>>>>> >>>>>> [1] An explanation of why a deadlock is possible. >>>>>> [1] Stacktraces of deadlocked threads from a production server. >>>>>> [2] A small unit test that adds a Thread.sleep() to the PropertyCache to >>>>>> make it always reproducable. >>>>>> [3] A patch solving this issue. >>>>>> [4] Explanations of why the patch rewrites the existing PropertyCache >>>>>> class. >>>>>> >>>>>> This was then reviewed and unit tests were added [5]. On top of this, I >>>>>> have committed the fix in my private branch and it works well on several >>>>>> big >>>>>> production systems. This is as far as I can go before a FOP committer >>>>>> takes >>>>>> it further. I am writing this because: >>>>>> >>>>>> - Deadlocks should be fixed. When they occur, there is no way around >>>>>> them. >>>>>> - The trunk is moving, the patch is aging and it will be more difficult >>>>>> to >>>>>> apply it over time. >>>>>> - It is discouraging for submitting more patches. >>>>>> >>>>>> >>>>>> Alexios Giotis >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c3 >>>>>> [2] https://issues.apache.org/bugzilla/attachment.cgi?id=27342 >>>>>> [3] >>>>>> https://issues.apache.org/bugzilla/attachment.cgi?id=27477&action=diff >>>>>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c7 >>>>>> [5] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c9 >>>>>> >>>>> >> >
