[ 
https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895614#comment-16895614
 ] 

Erik Krogen commented on HDFS-14370:
------------------------------------

Hey [~ayushtkn], thanks for taking a look. You raise great points. I am 
thinking we have two options:

# Set the default value of the maximum time to be -1. If this value is 
encountered, disable the backoff by setting the maximum to be equal to the 
minimum time. I think the drawback here is that -1 could also be interpreted as 
"no maximum," so this behavior may be misleading to some users.
# Change the backoff config to be a multiplier of the minimum sleep time. Set 
the default to be 1, effectively disabling backoff. This has the advantage of 
more consistent behavior, but determining a reasonable value may be a bit more 
difficult (more math involved).

Let me know what you think.

> Edit log tailing fast-path should allow for backoff
> ---------------------------------------------------
>
>                 Key: HDFS-14370
>                 URL: https://issues.apache.org/jira/browse/HDFS-14370
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode, qjm
>    Affects Versions: 3.3.0
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-14370.000.patch
>
>
> As part of HDFS-13150, in-progress edit log tailing was changed to use an 
> RPC-based mechanism, thus allowing the edit log tailing frequency to be 
> turned way down, and allowing standby/observer NameNodes to be only a few 
> milliseconds stale as compared to the Active NameNode.
> When there is a high volume of transactions on the system, each RPC fetches 
> transactions and takes some time to process them, self-rate-limiting how 
> frequently an RPC is submitted. In a lightly loaded cluster, however, most of 
> these RPCs return an empty set of transactions, consuming a high 
> (de)serialization overhead for very little benefit. This was reported by 
> [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where 
> the SbNN was submitting 8000 RPCs per second that returned empty.
> I propose we add some sort of backoff to the tailing, so that if an empty 
> response is received, it will wait a longer period of time before submitting 
> a new RPC.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to