[ 
https://issues.apache.org/jira/browse/AVRO-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992016#comment-12992016
 ] 

Bruno Dumon commented on AVRO-747:
----------------------------------

A problem with my previous patch is that when it releases the semaphore, 
NettyTransceiver.transceive() will return null. I thought this was fine, since 
this was already done in case of the other catched exceptions. However, this 
will give problems further on (NPE in ByteBufferInputStream) since apparently 
the transceive method is not supposed to return null.

I'll attach a newer patch that throws an IOException instead.

BTW, releaseSemaphore is also called by NettyClientAvroHandler.exceptionCaught, 
we should probably pass on the exception reported there.

It wouldn't harm if someone who actually knows Netty had a look at this.

> NettyTransceiver: release semaphores on close so that clients are not blocked.
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-747
>                 URL: https://issues.apache.org/jira/browse/AVRO-747
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Bruno Dumon
>         Attachments: netty-transceiver-release-semaphores-on-close-patch.txt
>
>
> I use Avro RPC with the NettyTransceiver.
> When I kill the server, often the client hangs, jstack shows the following:
> {noformat}
> "pool-6-thread-1" prio=10 tid=0x09fef000 nid=0x3382 waiting on condition 
> [0x76fc7000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0xa1df2e40> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>         at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
>         at 
> org.apache.avro.ipc.NettyTransceiver$CallFuture.get(NettyTransceiver.java:207)
>         at 
> org.apache.avro.ipc.NettyTransceiver.transceive(NettyTransceiver.java:137)
>         at org.apache.avro.ipc.Requestor.request(Requestor.java:123)
>         - locked <0xa20986c0> (a org.apache.avro.specific.SpecificRequestor)
>         at 
> org.apache.avro.specific.SpecificRequestor.invoke(SpecificRequestor.java:52)
> ...
> {noformat}
> Not that this matters much, but the client application is written such that 
> it discovers the available servers via ZooKeeper. When a server disappears, 
> it calls close on the corresponding NettyTransceiver.
> I have adjusted the NettyTransceiver.close() method to release any remaining 
> semaphores, the same as is done in the exceptionCaught method of the 
> UpstreamHandler. This solves the problem for me.
> Alternatively, we could handle channel close events in handleUpstream(), but 
> I'm not sure if Netty automatically reconnects if the server re-appears, in 
> which case this wouldn't be a good idea. OTOH, if the server would never come 
> back, client threads could hang forever?
> Patch in attachment, against svn r1064125.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to