[
https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901521#comment-13901521
]
Daryn Sharp commented on HADOOP-10278:
--------------------------------------
I think the use of an atomic reference is more desirable than additional
locking. Queue swapping will either occur rarely or perhaps never, so it's
unreasonable to impact normal operation. I believe handling the race during
swap is quite manageable. In case I've overlooked something, let's walk
through the logic.
I'll refer to tight race as what occurs when the atomic ref is swapped. A
thread may get the pre-swap value just as it's being swapped, then operate on
it.
*handlers/consumers*
During the swap, handlers might already be blocked on an already empty queue or
block during the tight race. To solve that, using {{q.poll}} instead of
{{q.take}} will cause the handlers to timeout and switch over to the new queue.
Handlers that consume 1 more call from the old queue during the tight race are
fine.
*readers/producers*
I'm not sure the readers need to use {{q.offer}} instead of {{q.put}}? If the
reader is blocked on a {{put}} then the queue being swapped out is already
full. When the old queue is "drained", these blocked readers' puts will
immediately unblock and succeed into the old queue. At most 1 call per reader
will be added to the queue post-swap. Likewise during the tight race, some
readers may put at most 1 call into the old queue. I believe this is
manageable:
*swapping queues*
The thread that swaps the queues already needs to drain the old queue into the
new queue. This thread will race with readers that might insert 1 more call
during the tight race. A drain using poll with a couple second timeout until
null is returned should catch those readers that might insert 1 more call.
The logic that attempts to fallback to the old queue probably isn't required.
The thread swapping should just block until it adds all calls to the new queue.
Losing or dropping calls under any condition is not desirable. A client may
be left waiting indefinitely for the lost call's response.
> Refactor to make CallQueue pluggable
> ------------------------------------
>
> Key: HADOOP-10278
> URL: https://issues.apache.org/jira/browse/HADOOP-10278
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: ipc
> Reporter: Chris Li
> Attachments: HADOOP-10278-atomicref-adapter.patch,
> HADOOP-10278-atomicref-rwlock.patch, HADOOP-10278-atomicref.patch,
> HADOOP-10278-atomicref.patch, HADOOP-10278-atomicref.patch,
> HADOOP-10278-atomicref.patch, HADOOP-10278.patch, HADOOP-10278.patch
>
>
> * Refactor CallQueue into an interface, base, and default implementation that
> matches today's behavior
> * Make the call queue impl configurable, keyed on port so that we minimize
> coupling
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)