[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187168#comment-13187168
 ] 

Todd Lipcon commented on HDFS-2681:
-----------------------------------

bq. So if your TCP disconnect timeouts are not set insanely high (> session 
timeout) then enterSafeMode will be called before session timeout expires and 
someone else becomes a master.

This still isn't "safe". For example, imagine the NN goes into a multi-minute 
GC pause just before writing an edit to its edit log. Since the GC pause is 
longer than the session timeout, some other NN will take over. Without active 
fencing, when the first NN wakes up, it will make that mutation to the edit log 
before it finds out about the ZK timeout.

It sounds contrived but we've had many instances of data loss bugs in HBase due 
to scenarios like this in the past. Multi-minute GC pauses are rare but do 
happen.

bq. It public because its a well defined property of the class.
But it implies that external consumers of this class may want to directly 
manipulate the znode -- which is exposing an implementation detail 
unnecessarily.

bq. Is the ALLCAPS on static strings a convention? You mean the member name 
should be all caps or the value?

Yes, it's a convention that constants should have all-caps names. See the Sun 
java coding conventions, which we more-or-less follow: 
http://www.oracle.com/technetwork/java/codeconventions-135099.html#367

bq. So I need to have mock initialized before constructing the tester object. 
So I made mock a static member. But then java complained that inner classes 
cannot have static members.
I'm not quite following - you already initialize the non-static {{mockZk}} in 
{{TestActiveStandbyElector.init()}}?. Then if it's a non-static inner class, it 
can simply refer to the already-initialized member of its outer class.

bq. Could you please point me to some place which explains what to log at 
different log levels?
I don't think we have any formal guidelines here.. the basic assumptions I make 
are:
- ERROR: unrecoverable errors (eg some block is apparently lost, or a failover 
failed, etc)
- WARN: recoverable errors (eg failures that will be retried, blocks that have 
become under-replicated but can be repaired, etc)
- INFO: normal operations proceeding as expected, but interesting enough that 
operators will want to see it.
- DEBUG: information that will be useful to developers debugging unit tests or 
running small test clusters (unit tests generally enable these, but users 
generally don't). Also handy when you have a reproducible bug on the client - 
you can ask the user to enable DEBUG and re-run, for example.
- TRACE: super-detailed trace information that will only be enabled in rare 
circumstances. We don't use this much.


                
> Add ZK client for leader election
> ---------------------------------
>
>                 Key: HDFS-2681
>                 URL: https://issues.apache.org/jira/browse/HDFS-2681
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Suresh Srinivas
>            Assignee: Bikas Saha
>             Fix For: HA branch (HDFS-1623)
>
>         Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
> HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, Zookeeper based Leader 
> Election and Monitoring Library.pdf
>
>
> ZKClient needs to support the following capabilities:
> # Ability to create a znode for co-ordinating leader election.
> # Ability to monitor and receive call backs when active znode status changes.
> # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to