[
https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718354#comment-16718354
]
Konstantin Shvachko commented on HDFS-13873:
--------------------------------------------
Slightly modified patch.
# Fixed exception handling. Thanks [~xkrogen] for debugging this.
# Used maxIdleTime from the Sever class rather than digging it out from
configuration in NNRPCServer.
# Fixed long lines in GSI contexts.
# {{TestHdfsConfigFields}} is failing because of HDFS-14017. [~vagarychen]
plans to fix it in the next jira. The otehr failure is passing locally.
We had an internal discussion. So my patch fixes the problem of a client
waiting too long for Observer to catch up. We looked at some remaining vectors
of attack by malicious clients.
* As Erik suggested, one can send requests to Observer with somewhat large
state id than it actually seen on ANN, but not large enough for Observer to
reject it out of the bat. This will increase load on Observer since such
malicious calls will stay longer in the queue than necessary. Although it
doesn't look too different for me from regular DDOS attacks by friendly
clients, which we observer on a daily bases, when they just send too many
{{getListings()}} to ANN.
* A variant of this attack can happen on a read-only (no writes) cluster. Then
those fake calls with slightly higher stateId can stay in the queue forever,
since stateId on the Observer is not progressing. Besides that I've never seen
a read-only HDFS cluster, I think there should be a general logic on RPC level
to clean up idle or about to timeout connections. If a client fails after
sending the request or the request was too long in the queue, there is no
reason for the server to execute such request since the client will never
receive the response. (And most probably already retried). Will be looking in
the code more, but if we don't have this now we should introduce it. This
should as well let Observer recover from such unfriendly attacks.
> ObserverNode should reject read requests when it is too far behind.
> -------------------------------------------------------------------
>
> Key: HDFS-13873
> URL: https://issues.apache.org/jira/browse/HDFS-13873
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client, namenode
> Affects Versions: HDFS-12943
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Priority: Major
> Attachments: HDFS-13873-HDFS-12943.001.patch,
> HDFS-13873-HDFS-12943.002.patch
>
>
> Add a server-side threshold for ObserverNode to reject read requests when it
> is too far behind.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]