[ https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718354#comment-16718354 ]
Konstantin Shvachko commented on HDFS-13873: -------------------------------------------- Slightly modified patch. # Fixed exception handling. Thanks [~xkrogen] for debugging this. # Used maxIdleTime from the Sever class rather than digging it out from configuration in NNRPCServer. # Fixed long lines in GSI contexts. # {{TestHdfsConfigFields}} is failing because of HDFS-14017. [~vagarychen] plans to fix it in the next jira. The otehr failure is passing locally. We had an internal discussion. So my patch fixes the problem of a client waiting too long for Observer to catch up. We looked at some remaining vectors of attack by malicious clients. * As Erik suggested, one can send requests to Observer with somewhat large state id than it actually seen on ANN, but not large enough for Observer to reject it out of the bat. This will increase load on Observer since such malicious calls will stay longer in the queue than necessary. Although it doesn't look too different for me from regular DDOS attacks by friendly clients, which we observer on a daily bases, when they just send too many {{getListings()}} to ANN. * A variant of this attack can happen on a read-only (no writes) cluster. Then those fake calls with slightly higher stateId can stay in the queue forever, since stateId on the Observer is not progressing. Besides that I've never seen a read-only HDFS cluster, I think there should be a general logic on RPC level to clean up idle or about to timeout connections. If a client fails after sending the request or the request was too long in the queue, there is no reason for the server to execute such request since the client will never receive the response. (And most probably already retried). Will be looking in the code more, but if we don't have this now we should introduce it. This should as well let Observer recover from such unfriendly attacks. > ObserverNode should reject read requests when it is too far behind. > ------------------------------------------------------------------- > > Key: HDFS-13873 > URL: https://issues.apache.org/jira/browse/HDFS-13873 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode > Affects Versions: HDFS-12943 > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > Priority: Major > Attachments: HDFS-13873-HDFS-12943.001.patch, > HDFS-13873-HDFS-12943.002.patch > > > Add a server-side threshold for ObserverNode to reject read requests when it > is too far behind. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org