[
https://issues.apache.org/jira/browse/BOOKKEEPER-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105219#comment-13105219
]
Flavio Junqueira commented on BOOKKEEPER-67:
--------------------------------------------
Let me give some context. Netty has this known issue when shutting down: it
hangs when we call releaseExternalResources() on the channel factory and there
are connections open. In the client, we try to open connections to bookies (if
necessary) when we create ledgers, and a race was causing us to have pending
connections that would cause the client to hang while shutting down.
We have only been able to reproduce the problem reliably creating a large
number, so the magic number 10k. I don't know where the sweet spot is, so it is
possible that we are able to reproduce reliably with fewer ledgers, but I'm not
sure how to pick a smaller number and guarantee that the problem will pop up in
the case the problem is not fixed.
One issue is that we have apparently fixed the problem, so if you bring it down
to 100, it should work, but if the change the value, it would be good to make
sure that we will be able to catch the bug in the future in the case some patch
reintroduces it.
> BookieReadWriteTest gets blocked and never finishes
> ---------------------------------------------------
>
> Key: BOOKKEEPER-67
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-67
> Project: Bookkeeper
> Issue Type: Bug
> Environment: RHEL4.8 and Debian 6
> Reporter: Matthieu Morel
> Attachments: BookieReadWriteTest-RHEL4.8.log,
> ShowFileDescriptorsInfo.java
>
>
> I systematically reproduce this behaviour on the linux boxes I tested with.
> The test gets stuck acquiring permits from a semaphore, normally used for
> throttling:
> "main" prio=10 tid=0x08058c00 nid=0x588d waiting on condition [0xf723c000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0xb5619728> (a
> java.util.concurrent.Semaphore$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
> at
> org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:394)
> at
> org.apache.bookkeeper.client.LedgerHandle.asyncAddEntry(LedgerHandle.java:366)
> at
> org.apache.bookkeeper.test.BookieReadWriteTest.testShutdown(BookieReadWriteTest.java:815)
> The issue might come from the synchronization mechanism used in the test
> itself.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira