[ 
https://issues.apache.org/jira/browse/HADOOP-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822511#comment-13822511
 ] 

Daryn Sharp commented on HADOOP-10099:
--------------------------------------

This isn't an easy problem to solve correctly.  Luckily, hadoop clients are 
"well behaved" in the sense that only socket is opened per ugi, and reconnects 
are delayed.  Malicious clients are the real problem.

Throttling by equating a client = host is a trivial way to identify a client.  
That's undesirable in many cases.  One errant MR task spamming connections to 
the NN will trigger a DoS for other tasks running on that node.  Or a task 
spamming connections to the RM or NM will DoS AMs on that node that need to 
make container requests or launches.  Admittedly it's better than a total DoS.

A combination of host + ugi would be a better identifier for throttling, but 
that's infeasible because a client can DoS by simply spamming socket opens or 
even just socket open/closes - in which case the ugi isn't even known yet.  A 
client can also spam sockets and then trickle 1-byte at a time every 2*idle to 
keep the connection from idling out.

----

Perhaps the KISS approach is an authorization log warn when a given 
connection/host/sec rate or total connection/host watermark is exceeded.

I filed this jira in response to comments on my other RPC performance jiras.  I 
think this is a minor issue since hadoop has survived thus far with no DoS 
protection.

> Reduce chance for RPC denial of service
> ---------------------------------------
>
>                 Key: HADOOP-10099
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10099
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Priority: Minor
>
> A RPC server may accept an unlimited number of connections unless indirectly 
> bounded by a blocking operation in the RPC handler threads.  The NN's 
> namespace locking happens to cause this blocking, but other RPC servers such 
> as yarn's generate async events which allow unbridled connection acceptance.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to