[jira] [Commented] (HDFS-2301) Start/stop appropriate namenode internal services during transition to active and standby

[email protected] (Commented) (JIRA) Thu, 06 Oct 2011 15:57:54 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122380#comment-13122380
 ]

[email protected] commented on HDFS-2301:
-----------------------------------------------------

bq.  On 2011-10-03 18:58:13, Todd Lipcon wrote:
bq.  > just a few nits, mostly looks good. A few questions I have that aren't 
directly related to this patch:
bq.  > - is SafeMode now a replicated thing, or does each NN separately enter 
safemode? I think the latter, right?
bq.  > - when transitioning between states, what happens if the "enterState" 
fails for the new state? The state variable will then indicate it's in that 
state, when in fact it's in no state at all. How do we recover from that? We 
need some kind of rollback? (eg if you're in standby and try to transition to 
active, but find that you can't take a lock in ZK)
bq.  
bq.  Suresh Srinivas wrote:
bq.      > is SafeMode now a replicated thing, or does each NN separately enter 
safemode? I think the latter, right?
bq.      Safemode is the state of namespace(FSNamesystem), unlike active, 
standby which are the states of the namenode. Each NN separately enters 
safemode.
bq.      
bq.      > when transitioning between states, what happens if the "enterState" 
fails for the new state? The state variable will then indicate it's in that 
state, when in fact it's in no state at all. How do we recover from that? We 
need some kind of rollback? (eg if you're in standby and try to transition to 
active, but find that you can't take a lock in ZK)
bq.      This is tricky. Say enterState fails to start services because of some 
namenode process related issues. Then most likely rolling back to previous 
state, and starting services relevant to previous states will also fail. The 
particular example you are bringing up related to ZK, I think failover 
controller is the one that deals with ZK and not namenode.
bq.      
bq.      I can think of two solutions: namenode shutsdown when this happens (as 
done during startup) or move to a failed state.

Let's just add a TODO for now that we need to consider these situations in a 
test plan. I imagine the most likely real scenario is that you try to do a 
failover, but for some reason the standby has an IO problem trying to read the 
latest logs from the primary (eg maybe the primary barfed some bad data into 
the edit logs as it crashed, or maybe the primary crashed because the shared 
storage caught on fire).

bq.  On 2011-10-03 18:58:13, Todd Lipcon wrote:
bq.  > 
branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java,
 line 464
bq.  > <https://reviews.apache.org/r/2150/diff/1/?file=47529#file47529line464>
bq.  >
bq.  >     any reason that you switched the order of startHttpServer to the end 
of this function? I don't think it's a big deal, but there's some possibility 
the service plugins may want to do something with the http server, which 
wouldn't be started yet.
bq.  
bq.  Suresh Srinivas wrote:
bq.      No particular reason. Not sure who uses ServicePlugins. But the 
description says it is RPC related. But will move it back up.

Hue currently uses service plugins to expose a Thrift interface. But with 
Sanjay's recent work on protocol adapters, this may be largely unnecessary in 
the future. Nonetheless, we should leave it around :)

- Todd

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2150/#review2277
-----------------------------------------------------------

On 2011-10-03 18:36:41, Todd Lipcon wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2150/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-03 18:36:41)
bq.  
bq.  
bq.  Review request for hadoop-hdfs and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Uploading Suresh's patch to reviewboard 
(https://issues.apache.org/jira/secure/attachment/12496953/HDFS-2301.txt from 
29/Sep/11 00:56)
bq.  
bq.  
bq.  This addresses bug HDFS-2301.
bq.      https://issues.apache.org/jira/browse/HDFS-2301
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java
 1177130 
bq.    
branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 1177130 
bq.    
branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
 1177130 
bq.    
branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ActiveState.java
 1177128 
bq.    
branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/HAContext.java
 PRE-CREATION 
bq.    
branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/HAState.java
 1177128 
bq.    
branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java
 1177128 
bq.  
bq.  Diff: https://reviews.apache.org/r/2150/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Todd
bq.  
bq.

> Start/stop appropriate namenode internal services during transition to active 
> and standby
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-2301
>                 URL: https://issues.apache.org/jira/browse/HDFS-2301
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: HA branch (HDFS-1623)
>
>         Attachments: HDFS-2301.txt, HDFS-2301.txt, HDFS-2301.txt, 
> HDFS-2301.txt
>
>
> These changes are related to HDFS-1974 which introduced active and standby 
> states. This jira will address starting and stopping appropriate NN services 
> when entering and existing active and standby states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2301) Start/stop appropriate namenode internal services during transition to active and standby

Reply via email to