[ 
https://issues.apache.org/jira/browse/HDFS-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729693#comment-13729693
 ] 

Aaron T. Myers commented on HDFS-5060:
--------------------------------------

bq. Adding to what Kihwal said, this should be turned off by default.

I'm not so sure about that. What if it were set to a very high threshold by 
default, say 50 GB of outstanding edit logs? Your concern seems to be about not 
forcing operators who keep a close on their system from being affected, but in 
a properly-functioning, well-monitored system, this threshold should never be 
hit, and thus having the feature off entirely seems like overkill to me. If you 
get to the point where you have tens or hundreds of gigabytes of edit logs 
outstanding, you're likely looking at a multi-hour restart if things go down.

bq. I think disrupting a running service is a big problem with the proposed 
approach. How often have you seen this issue that warrants a change like this?

Just a handful of times, but when this issue occurs the event is so severe that 
I think it warrants doing something to address it. We shouldn't let people 
shoot themselves in the foot.

bq. Why cannot bringing up a secondary/standby be a solution?

Of course that's a solution, and the proper long-term solution to this problem. 
This is certainly not meant to replace that. The issue is that I've observed 
several times folks allowing the checkpointing node to be down for an 
inordinately long time, and even with a monitoring solution in place that's 
alerting them to this issue, folks don't fully understand the ramifications of 
stale checkpoints and a large number of outstanding uncheckpointed edits.

bq. The issue that I have seen (quite infrequently though) is, secondary not 
being able to checkpoint due to editlog corruption. I created HDFS-4923 for 
this; if an operator forgets to manually save the namespace, during shutdown 
time the system could save the namespace automatically. This solves several 
issues mentioned in the jira.

That's a fine idea as well, but obviously won't help in the event that the 
shutdown isn't clean, so it won't completely alleviate this issue.
                
> NN should proactively perform a saveNamespace if it has a huge number of 
> outstanding uncheckpointed transactions
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5060
>                 URL: https://issues.apache.org/jira/browse/HDFS-5060
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.0-beta
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> In a properly-functioning HDFS system, checkpoints will be triggered either 
> by the secondary NN or standby NN regularly, by default every hour or 1MM 
> outstanding edits transactions, whichever come first. However, in cases where 
> this second node is down for an extended period of time, the number of 
> outstanding transactions can grow so large as to cause a restart to take an 
> inordinately long time.
> This JIRA proposes to make the active NN monitor its number of outstanding 
> transactions and perform a proactive local saveNamespace if it grows beyond a 
> configurable threshold. I'm envisioning something like 10x the configured 
> number of transactions which in a properly-functioning cluster would result 
> in a checkpoint from the second NN. Though this would be disruptive to 
> clients while it's taking place, likely for a few minutes, this seems better 
> than the alternative of a subsequent multi-hour restart and should never 
> actually occur in a properly-functioning cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to