[ 
https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718354#comment-16718354
 ] 

Konstantin Shvachko commented on HDFS-13873:
--------------------------------------------

Slightly modified patch.
# Fixed exception handling. Thanks [~xkrogen] for debugging this.
# Used maxIdleTime from the Sever class rather than digging it out from 
configuration in NNRPCServer.
# Fixed long lines in GSI contexts.
# {{TestHdfsConfigFields}} is failing because of HDFS-14017. [~vagarychen] 
plans to fix it in the next jira. The otehr failure is passing locally.

We had an internal discussion. So my patch fixes the problem of a client 
waiting too long for Observer to catch up. We looked at some remaining vectors 
of attack by malicious clients.
* As Erik suggested, one can send requests to Observer with somewhat large 
state id than it actually seen on ANN, but not large enough for Observer to 
reject it out of the bat. This will increase load on Observer since such 
malicious calls will stay longer in the queue than necessary. Although it 
doesn't look too different for me from regular DDOS attacks by friendly 
clients, which we observer on a daily bases, when they just send too many 
{{getListings()}} to ANN.
* A variant of this attack can happen on a read-only (no writes) cluster. Then 
those fake calls with slightly higher stateId can stay in the queue forever, 
since stateId on the Observer is not progressing. Besides that I've never seen 
a read-only HDFS cluster, I think there should be a general logic on RPC level 
to clean up idle or about to timeout connections. If a client fails after 
sending the request or the request was too long in the queue, there is no 
reason for the server to execute such request since the client will never 
receive the response. (And most probably already retried). Will be looking in 
the code more, but if we don't have this now we should introduce it. This 
should as well let Observer recover from such unfriendly attacks.

> ObserverNode should reject read requests when it is too far behind.
> -------------------------------------------------------------------
>
>                 Key: HDFS-13873
>                 URL: https://issues.apache.org/jira/browse/HDFS-13873
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client, namenode
>    Affects Versions: HDFS-12943
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Major
>         Attachments: HDFS-13873-HDFS-12943.001.patch, 
> HDFS-13873-HDFS-12943.002.patch
>
>
> Add a server-side threshold for ObserverNode to reject read requests when it 
> is too far behind.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to