[
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187168#comment-13187168
]
Todd Lipcon commented on HDFS-2681:
-----------------------------------
bq. So if your TCP disconnect timeouts are not set insanely high (> session
timeout) then enterSafeMode will be called before session timeout expires and
someone else becomes a master.
This still isn't "safe". For example, imagine the NN goes into a multi-minute
GC pause just before writing an edit to its edit log. Since the GC pause is
longer than the session timeout, some other NN will take over. Without active
fencing, when the first NN wakes up, it will make that mutation to the edit log
before it finds out about the ZK timeout.
It sounds contrived but we've had many instances of data loss bugs in HBase due
to scenarios like this in the past. Multi-minute GC pauses are rare but do
happen.
bq. It public because its a well defined property of the class.
But it implies that external consumers of this class may want to directly
manipulate the znode -- which is exposing an implementation detail
unnecessarily.
bq. Is the ALLCAPS on static strings a convention? You mean the member name
should be all caps or the value?
Yes, it's a convention that constants should have all-caps names. See the Sun
java coding conventions, which we more-or-less follow:
http://www.oracle.com/technetwork/java/codeconventions-135099.html#367
bq. So I need to have mock initialized before constructing the tester object.
So I made mock a static member. But then java complained that inner classes
cannot have static members.
I'm not quite following - you already initialize the non-static {{mockZk}} in
{{TestActiveStandbyElector.init()}}?. Then if it's a non-static inner class, it
can simply refer to the already-initialized member of its outer class.
bq. Could you please point me to some place which explains what to log at
different log levels?
I don't think we have any formal guidelines here.. the basic assumptions I make
are:
- ERROR: unrecoverable errors (eg some block is apparently lost, or a failover
failed, etc)
- WARN: recoverable errors (eg failures that will be retried, blocks that have
become under-replicated but can be repaired, etc)
- INFO: normal operations proceeding as expected, but interesting enough that
operators will want to see it.
- DEBUG: information that will be useful to developers debugging unit tests or
running small test clusters (unit tests generally enable these, but users
generally don't). Also handy when you have a reproducible bug on the client -
you can ask the user to enable DEBUG and re-run, for example.
- TRACE: super-detailed trace information that will only be enabled in rare
circumstances. We don't use this much.
> Add ZK client for leader election
> ---------------------------------
>
> Key: HDFS-2681
> URL: https://issues.apache.org/jira/browse/HDFS-2681
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Suresh Srinivas
> Assignee: Bikas Saha
> Fix For: HA branch (HDFS-1623)
>
> Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch,
> HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, Zookeeper based Leader
> Election and Monitoring Library.pdf
>
>
> ZKClient needs to support the following capabilities:
> # Ability to create a znode for co-ordinating leader election.
> # Ability to monitor and receive call backs when active znode status changes.
> # Ability to get information about the active node.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira