[ 
https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901521#comment-13901521
 ] 

Daryn Sharp commented on HADOOP-10278:
--------------------------------------

I think the use of an atomic reference is more desirable than additional 
locking.  Queue swapping will either occur rarely or perhaps never, so it's 
unreasonable to impact normal operation.  I believe handling the race during 
swap is quite manageable.  In case I've overlooked something, let's walk 
through the logic.

I'll refer to tight race as what occurs when the atomic ref is swapped.  A 
thread may get the pre-swap value just as it's being swapped, then operate on 
it.

*handlers/consumers*
During the swap, handlers might already be blocked on an already empty queue or 
block during the tight race.  To solve that, using {{q.poll}} instead of 
{{q.take}} will cause the handlers to timeout and switch over to the new queue. 
 Handlers that consume 1 more call from the old queue during the tight race are 
fine.

*readers/producers*
I'm not sure the readers need to use {{q.offer}} instead of {{q.put}}?  If the 
reader is blocked on a {{put}} then the queue being swapped out is already 
full.  When the old queue is "drained", these blocked readers' puts will 
immediately unblock and succeed into the old queue.  At most 1 call per reader 
will be added to the queue post-swap.  Likewise during the tight race, some 
readers may put at most 1 call into the old queue.  I believe this is 
manageable:

*swapping queues*
The thread that swaps the queues already needs to drain the old queue into the 
new queue.  This thread will race with readers that might insert 1 more call 
during the tight race.  A drain using poll with a couple second timeout until 
null is returned should catch those readers that might insert 1 more call.

The logic that attempts to fallback to the old queue probably isn't required.  
The thread swapping should just block until it adds all calls to the new queue. 
 Losing or dropping calls under any condition is not desirable.  A client may 
be left waiting indefinitely for the lost call's response.

> Refactor to make CallQueue pluggable
> ------------------------------------
>
>                 Key: HADOOP-10278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10278
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>            Reporter: Chris Li
>         Attachments: HADOOP-10278-atomicref-adapter.patch, 
> HADOOP-10278-atomicref-rwlock.patch, HADOOP-10278-atomicref.patch, 
> HADOOP-10278-atomicref.patch, HADOOP-10278-atomicref.patch, 
> HADOOP-10278-atomicref.patch, HADOOP-10278.patch, HADOOP-10278.patch
>
>
> * Refactor CallQueue into an interface, base, and default implementation that 
> matches today's behavior
> * Make the call queue impl configurable, keyed on port so that we minimize 
> coupling



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to