[ 
https://issues.apache.org/jira/browse/HADOOP-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HADOOP-14035:
---------------------------------
    Attachment: HADOOP-14035.patch

Wrapped rpc server exception + retriable into a CallQueueOverflowException 
exception.  It's an IllegalStateException to conform to the BlockingQueue api.

CallQueueManager conforms to BlockingQueue interface.  Backoff logic pushed 
down from ipc server into CQM.  CQM's put decides whether to call managed 
queue's put or add based on backoff.

Server simply calls CQM.put.  Catches overflow exceptions and unwraps the 
RpcServerException/RetriableException.  Rethrows to leverage prior changes to 
ipc layer to selectively close connections.

FCQ put remains unchanged.  Add, which CQM calls if backoff is enabled,  will 
offer to all queues, upon overflow it throws an overflow exception.  For the 
lowest priority calls, the overflow retriable closes the connection.  
Non-lowest priority calls, the overflow retriable leaves the connection open.

> Reduce fair call queue backoff's impact on clients
> --------------------------------------------------
>
>                 Key: HADOOP-14035
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14035
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.7.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-14035.patch
>
>
> When fcq backoff is enabled and an abusive client overflows the call queue, 
> its connection is closed, as well as subsequent good client connections.   
> Disconnects are very disruptive, esp. to multi-threaded clients with multiple 
> outstanding requests, or clients w/o a retry proxy (ex. datanodes).
> Until the abusive user is downgraded to a lower priority queue, 
> disconnect/reconnect mayhem occurs which significantly degrades performance.  
> Server metrics look good despite horrible client latency.
> The fcq should utilize selective ipc disconnects to avoid pushback 
> disconnecting good clients.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to