Re: Bugzilla #46962 - Deadlock in PropertyCache

Alexios Giotis Thu, 01 Mar 2012 04:21:04 -0800

Vincent, 

No need to send more info. I just reproduced it on Windows Server 2008 R2 
Enterprise running the latest, 'non-business', but not supported as of Nov 
2009, Sun JRE 1.5.0_22. The production servers are all running JDK6 and I could 
not test it on a Mac.


The bug is indeed related to the concurrent hash map iterators and is fixed on 
JDK6, see
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6312056

I will update the patch and I hope a committer will check it.

Alex Giotis




On Mar 1, 2012, at 3:13 AM, Alexios Giotis wrote:

> Hi Vincent,
> 
> The unit tests where added by Mehdi but I could happily update them to Junit 
> 4 and create the separate methods.
> 
> Related to the NullPointerException that you found, I tried many times but 
> unfortunately I can't reproduce them on my Macbook running MacOS X 10.7.3  
> with any JRE and have not seen it in any heavily loaded server. What is the 
> environment that you tested it ? I could test it on different servers/OS or 
> try to write a test of ConcurrentHashMap that can trigger it.
> 
> 
> Related to what is happening, this is tricky. The ConcurrentHashMap does not 
> allow putting null values, so one would not expect to get back null values. 
> But in the cleanReclaimedMapEntries() we are getting the values using an 
> iterator over the entrySet. As the javadoc says, the view's iterator are 
> "weakly consistent", never throw ConcurrentModificationException and they are 
> actually designed to be used by only one thread at a time. So this could 
> explain why a null value could be returned. 
> 
> Depending on how this is reproduced, we could either avoid this by just 
> checking for a null value or we could put the iteration in a block 
> synchronized on the map instance. This loop is done once per 10000 items 
> added, so it should not be slower. But before making this change, I need a 
> way to reproduce it so that we are sure about it.
> 
> 
> Alex Giotis
> 
> 
> 
> 
> On Feb 29, 2012, at 11:42 PM, Vincent Hennebert wrote:
> 
>> Sorry for the delay about this. I had a look at the patch some time ago
>> but didn’t have enough time to apply it.
>> 
>> I noticed that a NPE occurs when adding the following piece of code to
>> PropertyCacheTestCase and running it:
>> 
>>   private final PropertyCache<Integer> cache = new PropertyCache<Integer>();
>> 
>>   private class CacheFiller implements Runnable {
>> 
>>       private final int start;
>> 
>>       CacheFiller(int start) {
>>           this.start = start;
>>       }
>> 
>>       public void run() {
>>           for (int i = 0; i < 1000000; i++) {
>>               cache.fetch(new Integer(start + i));
>>           }
>>       }
>>   }
>> 
>>   public void testCleanUp() throws InterruptedException {
>>       Thread t1 = new Thread(new CacheFiller(0));
>>       Thread t2 = new Thread(new CacheFiller(10000));
>>       t1.start();
>>       t2.start();
>>       t1.join();
>>       t2.join();
>>   }
>> 
>> Here’s the stacktrace:
>> Exception in thread "Thread-1" java.lang.NullPointerException
>>      at
>> org.apache.fop.fo.properties.PropertyCache.cleanReclaimedMapEntries(PropertyCache.java:153)
>>      at 
>> org.apache.fop.fo.properties.PropertyCache.fetch(PropertyCache.java:105)
>>      at
>> org.apache.fop.fo.properties.PropertyCacheTestCase$CacheFiller.run(PropertyCacheTestCase.java:68)
>>      at java.lang.Thread.run(Thread.java:595)
>> 
>> It happens when running the test with a Sun 1.5 JRE but apparently not
>> with OpenJDK 1.6. I’m still unsure about what’s happening.
>> 
>> It would be good to update the test cases to JUnit 4. Also, there are
>> some clean-ups and improvements to bring. Among other things, symmetry
>> is not properly tested in FObjectTest (symmetric.equals(getSut()) should
>> also be run); the tests for reflexivity, symmetry and transitivity
>> should be put in separate methods; testFailureCases is misnamed; etc.
>> Little things but that take some time to fix.
>> 
>> Vincent
>> 
>> 
>> On 28/02/12 17:09, mehdi houshmand wrote:
>>> Hi Guys,
>>> 
>>> My apologies for the lack of transparency on this issue, but I didn't
>>> actually review the changes you made here, in fact, I barely looked at
>>> what PropertyCache actually does. I had some free time, and added a
>>> bunch of unit tests.
>>> 
>>> The reason this hasn't been committed yet was because Vincent said he
>>> had some questions about the patch. That's as far as I know, maybe he
>>> could give some feedback on the issue.
>>> 
>>> Let me reiterate my apologies again on this, it's not fair that this
>>> has been ignored. I'll endeavour to make the process more transparent
>>> in future, I hope this doesn't prevent you or any other contributors
>>> from submitting patches.
>>> 
>>> Mehdi
>>> 
>>> 
>>> On 28 February 2012 16:52, Glenn Adams <[email protected]> wrote:
>>>> I support committing this patch, however I don't see an ICLA listed at [1]
>>>> for Alexios. Alexios, if you have not submitted an ICLA [2], please do so.
>>>> 
>>>> I would be happy to apply the patch (if Mehdi doesn't have the time).
>>>> 
>>>> [1] http://people.apache.org/committer-index.html#unlistedclas
>>>> [2] http://www.apache.org/licenses/icla.txt
>>>> 
>>>> 
>>>> On Tue, Feb 28, 2012 at 6:27 AM, Alexios Giotis <[email protected]>
>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> About 6 months ago, I had a deadlock issue that regularly stopped
>>>>> production servers. While I was opening a bugzilla ticket, I found that 
>>>>> this
>>>>> was already reported back in 2009. This issue is still opened as it was
>>>>> difficult to reproduce. On that issue, I added:
>>>>> 
>>>>> [1] An explanation of why a deadlock is possible.
>>>>> [1] Stacktraces of deadlocked threads from a production server.
>>>>> [2] A small unit test that adds a Thread.sleep() to the PropertyCache to
>>>>> make it always reproducable.
>>>>> [3] A patch solving this issue.
>>>>> [4] Explanations of why the patch rewrites the existing PropertyCache
>>>>> class.
>>>>> 
>>>>> This was then reviewed and unit tests were added [5]. On top of this, I
>>>>> have committed the fix in my private branch and it works well on several 
>>>>> big
>>>>> production systems. This is as far as I can go before a FOP committer 
>>>>> takes
>>>>> it further. I am writing this because:
>>>>> 
>>>>> - Deadlocks should be fixed. When they occur, there is no way around them.
>>>>> - The trunk is moving, the patch is aging and it will be more difficult to
>>>>> apply it over time.
>>>>> - It is discouraging for submitting more patches.
>>>>> 
>>>>> 
>>>>> Alexios Giotis
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c3
>>>>> [2] https://issues.apache.org/bugzilla/attachment.cgi?id=27342
>>>>> [3] https://issues.apache.org/bugzilla/attachment.cgi?id=27477&action=diff
>>>>> [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c7
>>>>> [5] https://issues.apache.org/bugzilla/show_bug.cgi?id=46962#c9
>>>>> 
>>>> 
>

Re: Bugzilla #46962 - Deadlock in PropertyCache

Reply via email to