http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/src/docs/src/documentation/content/xdocs/zookeeperHierarchicalQuorums.xml ---------------------------------------------------------------------- diff --git a/src/docs/src/documentation/content/xdocs/zookeeperHierarchicalQuorums.xml b/src/docs/src/documentation/content/xdocs/zookeeperHierarchicalQuorums.xml deleted file mode 100644 index f71c4a8..0000000 --- a/src/docs/src/documentation/content/xdocs/zookeeperHierarchicalQuorums.xml +++ /dev/null @@ -1,75 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- - Copyright 2002-2004 The Apache Software Foundation - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> - -<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" -"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> -<article id="zk_HierarchicalQuorums"> - <title>Introduction to hierarchical quorums</title> - - <articleinfo> - <legalnotice> - <para>Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. You may - obtain a copy of the License at <ulink - url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para> - - <para>Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an "AS IS" - BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied. See the License for the specific language governing permissions - and limitations under the License.</para> - </legalnotice> - - <abstract> - <para>This document contains information about hierarchical quorums.</para> - </abstract> - </articleinfo> - - <para> - This document gives an example of how to use hierarchical quorums. The basic idea is - very simple. First, we split servers into groups, and add a line for each group listing - the servers that form this group. Next we have to assign a weight to each server. - </para> - - <para> - The following example shows how to configure a system with three groups of three servers - each, and we assign a weight of 1 to each server: - </para> - - <programlisting> - group.1=1:2:3 - group.2=4:5:6 - group.3=7:8:9 - - weight.1=1 - weight.2=1 - weight.3=1 - weight.4=1 - weight.5=1 - weight.6=1 - weight.7=1 - weight.8=1 - weight.9=1 - </programlisting> - - <para> - When running the system, we are able to form a quorum once we have a majority of votes from - a majority of non-zero-weight groups. Groups that have zero weight are discarded and not - considered when forming quorums. Looking at the example, we are able to form a quorum once - we have votes from at least two servers from each of two different groups. - </para> - </article> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/src/docs/src/documentation/content/xdocs/zookeeperInternals.xml ---------------------------------------------------------------------- diff --git a/src/docs/src/documentation/content/xdocs/zookeeperInternals.xml b/src/docs/src/documentation/content/xdocs/zookeeperInternals.xml deleted file mode 100644 index 7815bc1..0000000 --- a/src/docs/src/documentation/content/xdocs/zookeeperInternals.xml +++ /dev/null @@ -1,487 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- - Copyright 2002-2004 The Apache Software Foundation - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> - -<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" -"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> -<article id="ar_ZooKeeperInternals"> - <title>ZooKeeper Internals</title> - - <articleinfo> - <legalnotice> - <para>Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. You may - obtain a copy of the License at <ulink - url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para> - - <para>Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an "AS IS" - BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied. See the License for the specific language governing permissions - and limitations under the License.</para> - </legalnotice> - - <abstract> - <para>This article contains topics which discuss the inner workings of - ZooKeeper. So far, that's logging and atomic broadcast. </para> - - </abstract> - </articleinfo> - - <section id="ch_Introduction"> - <title>Introduction</title> - - <para>This document contains information on the inner workings of ZooKeeper. - So far, it discusses these topics: - </para> - -<itemizedlist> -<listitem><para><xref linkend="sc_atomicBroadcast"/></para></listitem> -<listitem><para><xref linkend="sc_logging"/></para></listitem> -</itemizedlist> - -</section> - -<section id="sc_atomicBroadcast"> -<title>Atomic Broadcast</title> - -<para> -At the heart of ZooKeeper is an atomic messaging system that keeps all of the servers in sync.</para> - -<section id="sc_guaranteesPropertiesDefinitions"><title>Guarantees, Properties, and Definitions</title> -<para> -The specific guarantees provided by the messaging system used by ZooKeeper are the following:</para> - -<variablelist> - -<varlistentry><term><emphasis >Reliable delivery</emphasis></term> -<listitem><para>If a message, m, is delivered -by one server, it will be eventually delivered by all servers.</para></listitem></varlistentry> - -<varlistentry><term><emphasis >Total order</emphasis></term> -<listitem><para> If a message is -delivered before message b by one server, a will be delivered before b by all -servers. If a and b are delivered messages, either a will be delivered before b -or b will be delivered before a.</para></listitem></varlistentry> - -<varlistentry><term><emphasis >Causal order</emphasis> </term> - -<listitem><para> -If a message b is sent after a message a has been delivered by the sender of b, -a must be ordered before b. If a sender sends c after sending b, c must be ordered after b. -</para></listitem></varlistentry> - -</variablelist> - - -<para> -The ZooKeeper messaging system also needs to be efficient, reliable, and easy to -implement and maintain. We make heavy use of messaging, so we need the system to -be able to handle thousands of requests per second. Although we can require at -least k+1 correct servers to send new messages, we must be able to recover from -correlated failures such as power outages. When we implemented the system we had -little time and few engineering resources, so we needed a protocol that is -accessible to engineers and is easy to implement. We found that our protocol -satisfied all of these goals. - -</para> - -<para> -Our protocol assumes that we can construct point-to-point FIFO channels between -the servers. While similar services usually assume message delivery that can -lose or reorder messages, our assumption of FIFO channels is very practical -given that we use TCP for communication. Specifically we rely on the following property of TCP:</para> - -<variablelist> - -<varlistentry> -<term><emphasis >Ordered delivery</emphasis></term> -<listitem><para>Data is delivered in the same order it is sent and a message m is -delivered only after all messages sent before m have been delivered. -(The corollary to this is that if message m is lost all messages after m will be lost.)</para></listitem></varlistentry> - -<varlistentry><term><emphasis >No message after close</emphasis></term> -<listitem><para>Once a FIFO channel is closed, no messages will be received from it.</para></listitem></varlistentry> - -</variablelist> - -<para> -FLP proved that consensus cannot be achieved in asynchronous distributed systems -if failures are possible. To ensure we achieve consensus in the presence of failures -we use timeouts. However, we rely on times for liveness not for correctness. So, -if timeouts stop working (clocks malfunction for example) the messaging system may -hang, but it will not violate its guarantees.</para> - -<para>When describing the ZooKeeper messaging protocol we will talk of packets, -proposals, and messages:</para> -<variablelist> -<varlistentry><term><emphasis >Packet</emphasis></term> -<listitem><para>a sequence of bytes sent through a FIFO channel</para></listitem></varlistentry><varlistentry> - -<term><emphasis >Proposal</emphasis></term> -<listitem><para>a unit of agreement. Proposals are agreed upon by exchanging packets -with a quorum of ZooKeeper servers. Most proposals contain messages, however the -NEW_LEADER proposal is an example of a proposal that does not correspond to a message.</para></listitem> -</varlistentry><varlistentry> - -<term><emphasis >Message</emphasis></term> -<listitem><para>a sequence of bytes to be atomically broadcast to all ZooKeeper -servers. A message put into a proposal and agreed upon before it is delivered.</para></listitem> -</varlistentry> - -</variablelist> - -<para> -As stated above, ZooKeeper guarantees a total order of messages, and it also -guarantees a total order of proposals. ZooKeeper exposes the total ordering using -a ZooKeeper transaction id (<emphasis>zxid</emphasis>). All proposals will be stamped with a zxid when -it is proposed and exactly reflects the total ordering. Proposals are sent to all -ZooKeeper servers and committed when a quorum of them acknowledge the proposal. -If a proposal contains a message, the message will be delivered when the proposal -is committed. Acknowledgement means the server has recorded the proposal to persistent storage. -Our quorums have the requirement that any pair of quorum must have at least one server -in common. We ensure this by requiring that all quorums have size (<emphasis>n/2+1</emphasis>) where -n is the number of servers that make up a ZooKeeper service. -</para> - -<para> -The zxid has two parts: the epoch and a counter. In our implementation the zxid -is a 64-bit number. We use the high order 32-bits for the epoch and the low order -32-bits for the counter. Because it has two parts represent the zxid both as a -number and as a pair of integers, (<emphasis>epoch, count</emphasis>). The epoch number represents a -change in leadership. Each time a new leader comes into power it will have its -own epoch number. We have a simple algorithm to assign a unique zxid to a proposal: -the leader simply increments the zxid to obtain a unique zxid for each proposal. -<emphasis>Leadership activation will ensure that only one leader uses a given epoch, so our -simple algorithm guarantees that every proposal will have a unique id.</emphasis> -</para> - -<para> -ZooKeeper messaging consists of two phases:</para> - -<variablelist> -<varlistentry><term><emphasis >Leader activation</emphasis></term> -<listitem><para>In this phase a leader establishes the correct state of the system -and gets ready to start making proposals.</para></listitem> -</varlistentry> - -<varlistentry><term><emphasis >Active messaging</emphasis></term> -<listitem><para>In this phase a leader accepts messages to propose and coordinates message delivery.</para></listitem> -</varlistentry> -</variablelist> - -<para> -ZooKeeper is a holistic protocol. We do not focus on individual proposals, rather -look at the stream of proposals as a whole. Our strict ordering allows us to do this -efficiently and greatly simplifies our protocol. Leadership activation embodies -this holistic concept. A leader becomes active only when a quorum of followers -(The leader counts as a follower as well. You can always vote for yourself ) has synced -up with the leader, they have the same state. This state consists of all of the -proposals that the leader believes have been committed and the proposal to follow -the leader, the NEW_LEADER proposal. (Hopefully you are thinking to -yourself, <emphasis>Does the set of proposals that the leader believes has been committed -included all the proposals that really have been committed?</emphasis> The answer is <emphasis>yes</emphasis>. -Below, we make clear why.) -</para> - -</section> - -<section id="sc_leaderElection"> - -<title>Leader Activation</title> -<para> -Leader activation includes leader election. We currently have two leader election -algorithms in ZooKeeper: LeaderElection and FastLeaderElection (AuthFastLeaderElection -is a variant of FastLeaderElection that uses UDP and allows servers to perform a simple -form of authentication to avoid IP spoofing). ZooKeeper messaging doesn't care about the -exact method of electing a leader has long as the following holds: -</para> - -<itemizedlist> - -<listitem><para>The leader has seen the highest zxid of all the followers.</para></listitem> -<listitem><para>A quorum of servers have committed to following the leader.</para></listitem> - -</itemizedlist> - -<para> -Of these two requirements only the first, the highest zxid amoung the followers -needs to hold for correct operation. The second requirement, a quorum of followers, -just needs to hold with high probability. We are going to recheck the second requirement, -so if a failure happens during or after the leader election and quorum is lost, -we will recover by abandoning leader activation and running another election. -</para> - -<para> -After leader election a single server will be designated as a leader and start -waiting for followers to connect. The rest of the servers will try to connect to -the leader. The leader will sync up with followers by sending any proposals they -are missing, or if a follower is missing too many proposals, it will send a full -snapshot of the state to the follower. -</para> - -<para> -There is a corner case in which a follower that has proposals, U, not seen -by a leader arrives. Proposals are seen in order, so the proposals of U will have a zxids -higher than zxids seen by the leader. The follower must have arrived after the -leader election, otherwise the follower would have been elected leader given that -it has seen a higher zxid. Since committed proposals must be seen by a quorum of -servers, and a quorum of servers that elected the leader did not see U, the proposals -of you have not been committed, so they can be discarded. When the follower connects -to the leader, the leader will tell the follower to discard U. -</para> - -<para> -A new leader establishes a zxid to start using for new proposals by getting the -epoch, e, of the highest zxid it has seen and setting the next zxid to use to be -(e+1, 0), fter the leader syncs with a follower, it will propose a NEW_LEADER -proposal. Once the NEW_LEADER proposal has been committed, the leader will activate -and start receiving and issuing proposals. -</para> - -<para> -It all sounds complicated but here are the basic rules of operation during leader -activation: -</para> - -<itemizedlist> -<listitem><para>A follower will ACK the NEW_LEADER proposal after it has synced with the leader.</para></listitem> -<listitem><para>A follower will only ACK a NEW_LEADER proposal with a given zxid from a single server.</para></listitem> -<listitem><para>A new leader will COMMIT the NEW_LEADER proposal when a quorum of followers have ACKed it.</para></listitem> -<listitem><para>A follower will commit any state it received from the leader when the NEW_LEADER proposal is COMMIT.</para></listitem> -<listitem><para>A new leader will not accept new proposals until the NEW_LEADER proposal has been COMMITED.</para></listitem> -</itemizedlist> - -<para> -If leader election terminates erroneously, we don't have a problem since the -NEW_LEADER proposal will not be committed since the leader will not have quorum. -When this happens, the leader and any remaining followers will timeout and go back -to leader election. -</para> - -</section> - -<section id="sc_activeMessaging"> -<title>Active Messaging</title> -<para> -Leader Activation does all the heavy lifting. Once the leader is coronated he can -start blasting out proposals. As long as he remains the leader no other leader can -emerge since no other leader will be able to get a quorum of followers. If a new -leader does emerge, -it means that the leader has lost quorum, and the new leader will clean up any -mess left over during her leadership activation. -</para> - -<para>ZooKeeper messaging operates similar to a classic two-phase commit.</para> - -<mediaobject id="fg_2phaseCommit" > - <imageobject> - <imagedata fileref="images/2pc.jpg"/> - </imageobject> -</mediaobject> - -<para> -All communication channels are FIFO, so everything is done in order. Specifically -the following operating constraints are observed:</para> - -<itemizedlist> - -<listitem><para>The leader sends proposals to all followers using -the same order. Moreover, this order follows the order in which requests have been -received. Because we use FIFO channels this means that followers also receive proposals in order. -</para></listitem> - -<listitem><para>Followers process messages in the order they are received. This -means that messages will be ACKed in order and the leader will receive ACKs from -followers in order, due to the FIFO channels. It also means that if message $m$ -has been written to non-volatile storage, all messages that were proposed before -$m$ have been written to non-volatile storage.</para></listitem> - -<listitem><para>The leader will issue a COMMIT to all followers as soon as a -quorum of followers have ACKed a message. Since messages are ACKed in order, -COMMITs will be sent by the leader as received by the followers in order.</para></listitem> - -<listitem><para>COMMITs are processed in order. Followers deliver a proposals -message when that proposal is committed.</para></listitem> - -</itemizedlist> - -</section> - -<section id="sc_summary"> -<title>Summary</title> -<para>So there you go. Why does it work? Specifically, why does a set of proposals -believed by a new leader always contain any proposal that has actually been committed? -First, all proposals have a unique zxid, so unlike other protocols, we never have -to worry about two different values being proposed for the same zxid; followers -(a leader is also a follower) see and record proposals in order; proposals are -committed in order; there is only one active leader at a time since followers only -follow a single leader at a time; a new leader has seen all committed proposals -from the previous epoch since it has seen the highest zxid from a quorum of servers; -any uncommited proposals from a previous epoch seen by a new leader will be committed -by that leader before it becomes active.</para></section> - -<section id="sc_comparisons"><title>Comparisons</title> -<para> -Isn't this just Multi-Paxos? No, Multi-Paxos requires some way of assuring that -there is only a single coordinator. We do not count on such assurances. Instead -we use the leader activation to recover from leadership change or old leaders -believing they are still active. -</para> - -<para> -Isn't this just Paxos? Your active messaging phase looks just like phase 2 of Paxos? -Actually, to us active messaging looks just like 2 phase commit without the need to -handle aborts. Active messaging is different from both in the sense that it has -cross proposal ordering requirements. If we do not maintain strict FIFO ordering of -all packets, it all falls apart. Also, our leader activation phase is different from -both of them. In particular, our use of epochs allows us to skip blocks of uncommitted -proposals and to not worry about duplicate proposals for a given zxid. -</para> - -</section> - -</section> - -<section id="sc_quorum"> -<title>Quorums</title> - -<para> -Atomic broadcast and leader election use the notion of quorum to guarantee a consistent -view of the system. By default, ZooKeeper uses majority quorums, which means that every -voting that happens in one of these protocols requires a majority to vote on. One example is -acknowledging a leader proposal: the leader can only commit once it receives an -acknowledgement from a quorum of servers. -</para> - -<para> -If we extract the properties that we really need from our use of majorities, we have that we only -need to guarantee that groups of processes used to validate an operation by voting (e.g., acknowledging -a leader proposal) pairwise intersect in at least one server. Using majorities guarantees such a property. -However, there are other ways of constructing quorums different from majorities. For example, we can assign -weights to the votes of servers, and say that the votes of some servers are more important. To obtain a quorum, -we get enough votes so that the sum of weights of all votes is larger than half of the total sum of all weights. -</para> - -<para> -A different construction that uses weights and is useful in wide-area deployments (co-locations) is a hierarchical -one. With this construction, we split the servers into disjoint groups and assign weights to processes. To form -a quorum, we have to get a hold of enough servers from a majority of groups G, such that for each group g in G, -the sum of votes from g is larger than half of the sum of weights in g. Interestingly, this construction enables -smaller quorums. If we have, for example, 9 servers, we split them into 3 groups, and assign a weight of 1 to each -server, then we are able to form quorums of size 4. Note that two subsets of processes composed each of a majority -of servers from each of a majority of groups necessarily have a non-empty intersection. It is reasonable to expect -that a majority of co-locations will have a majority of servers available with high probability. -</para> - -<para> -With ZooKeeper, we provide a user with the ability of configuring servers to use majority quorums, weights, or a -hierarchy of groups. -</para> -</section> - -<section id="sc_logging"> - -<title>Logging</title> -<para> -Zookeeper uses -<ulink url="http://www.slf4j.org/index.html">slf4j</ulink> as an abstraction layer for logging. -<ulink url="http://logging.apache.org/log4j">log4j</ulink> in version 1.2 is chosen as the final logging implementation for now. -For better embedding support, it is planned in the future to leave the decision of choosing the final logging implementation to the end user. -Therefore, always use the slf4j api to write log statements in the code, but configure log4j for how to log at runtime. -Note that slf4j has no FATAL level, former messages at FATAL level have been moved to ERROR level. -For information on configuring log4j for -ZooKeeper, see the <ulink url="zookeeperAdmin.html#sc_logging">Logging</ulink> section -of the <ulink url="zookeeperAdmin.html">ZooKeeper Administrator's Guide.</ulink> - -</para> - -<section id="sc_developerGuidelines"><title>Developer Guidelines</title> - -<para>Please follow the -<ulink url="http://www.slf4j.org/manual.html">slf4j manual</ulink> when creating log statements within code. -Also read the -<ulink url="http://www.slf4j.org/faq.html#logging_performance">FAQ on performance</ulink> -, when creating log statements. Patch reviewers will look for the following:</para> -<section id="sc_rightLevel"><title>Logging at the Right Level</title> -<para> -There are several levels of logging in slf4j. -It's important to pick the right one. In order of higher to lower severity:</para> -<orderedlist> - <listitem><para>ERROR level designates error events that might still allow the application to continue running.</para></listitem> - <listitem><para>WARN level designates potentially harmful situations.</para></listitem> - <listitem><para>INFO level designates informational messages that highlight the progress of the application at coarse-grained level.</para></listitem> - <listitem><para>DEBUG Level designates fine-grained informational events that are most useful to debug an application.</para></listitem> - <listitem><para>TRACE Level designates finer-grained informational events than the DEBUG.</para></listitem> -</orderedlist> - -<para> -ZooKeeper is typically run in production such that log messages of INFO level -severity and higher (more severe) are output to the log.</para> - - -</section> - -<section id="sc_slf4jIdioms"><title>Use of Standard slf4j Idioms</title> - -<para><emphasis>Static Message Logging</emphasis></para> -<programlisting> -LOG.debug("process completed successfully!"); -</programlisting> - -<para> -However when creating parameterized messages are required, use formatting anchors. -</para> - -<programlisting> -LOG.debug("got {} messages in {} minutes",new Object[]{count,time}); -</programlisting> - - -<para><emphasis>Naming</emphasis></para> - -<para> -Loggers should be named after the class in which they are used. -</para> - -<programlisting> -public class Foo { - private static final Logger LOG = LoggerFactory.getLogger(Foo.class); - .... - public Foo() { - LOG.info("constructing Foo"); -</programlisting> - -<para><emphasis>Exception handling</emphasis></para> -<programlisting> -try { - // code -} catch (XYZException e) { - // do this - LOG.error("Something bad happened", e); - // don't do this (generally) - // LOG.error(e); - // why? because "don't do" case hides the stack trace - - // continue process here as you need... recover or (re)throw -} -</programlisting> -</section> -</section> - -</section> - -</article> http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/src/docs/src/documentation/content/xdocs/zookeeperJMX.xml ---------------------------------------------------------------------- diff --git a/src/docs/src/documentation/content/xdocs/zookeeperJMX.xml b/src/docs/src/documentation/content/xdocs/zookeeperJMX.xml deleted file mode 100644 index f0ea636..0000000 --- a/src/docs/src/documentation/content/xdocs/zookeeperJMX.xml +++ /dev/null @@ -1,236 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- - Copyright 2002-2004 The Apache Software Foundation - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> - -<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" -"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> -<article id="bk_zookeeperjmx"> - <title>ZooKeeper JMX</title> - - <articleinfo> - <legalnotice> - <para>Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. You may - obtain a copy of the License at <ulink - url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para> - - <para>Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an "AS IS" - BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied. See the License for the specific language governing permissions - and limitations under the License.</para> - </legalnotice> - - <abstract> - <para>ZooKeeper support for JMX</para> - </abstract> - </articleinfo> - - <section id="ch_jmx"> - <title>JMX</title> - <para>Apache ZooKeeper has extensive support for JMX, allowing you - to view and manage a ZooKeeper serving ensemble.</para> - - <para>This document assumes that you have basic knowledge of - JMX. See <ulink - url="http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/"> - Sun JMX Technology</ulink> page to get started with JMX. - </para> - - <para>See the <ulink - url="http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html"> - JMX Management Guide</ulink> for details on setting up local and - remote management of VM instances. By default the included - <emphasis>zkServer.sh</emphasis> supports only local management - - review the linked document to enable support for remote management - (beyond the scope of this document). - </para> - - </section> - - <section id="ch_starting"> - <title>Starting ZooKeeper with JMX enabled</title> - - <para>The class - <emphasis>org.apache.zookeeper.server.quorum.QuorumPeerMain</emphasis> - will start a JMX manageable ZooKeeper server. This class - registers the proper MBeans during initalization to support JMX - monitoring and management of the - instance. See <emphasis>bin/zkServer.sh</emphasis> for one - example of starting ZooKeeper using QuorumPeerMain.</para> - </section> - - <section id="ch_console"> - <title>Run a JMX console</title> - - <para>There are a number of JMX consoles available which can connect - to the running server. For this example we will use Sun's - <emphasis>jconsole</emphasis>.</para> - - <para>The Java JDK ships with a simple JMX console - named <ulink url="http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html">jconsole</ulink> - which can be used to connect to ZooKeeper and inspect a running - server. Once you've started ZooKeeper using QuorumPeerMain - start <emphasis>jconsole</emphasis>, which typically resides in - <emphasis>JDK_HOME/bin/jconsole</emphasis></para> - - <para>When the "new connection" window is displayed either connect - to local process (if jconsole started on same host as Server) or - use the remote process connection.</para> - - <para>By default the "overview" tab for the VM is displayed (this - is a great way to get insight into the VM btw). Select - the "MBeans" tab.</para> - - <para>You should now see <emphasis>org.apache.ZooKeeperService</emphasis> - on the left hand side. Expand this item and depending on how you've - started the server you will be able to monitor and manage various - service related features.</para> - - <para>Also note that ZooKeeper will register log4j MBeans as - well. In the same section along the left hand side you will see - "log4j". Expand that to manage log4j through JMX. Of particular - interest is the ability to dynamically change the logging levels - used by editing the appender and root thresholds. Log4j MBean - registration can be disabled by passing - <emphasis>-Dzookeeper.jmx.log4j.disable=true</emphasis> to the JVM - when starting ZooKeeper. - </para> - - </section> - - <section id="ch_reference"> - <title>ZooKeeper MBean Reference</title> - - <para>This table details JMX for a server participating in a - replicated ZooKeeper ensemble (ie not standalone). This is the - typical case for a production environment.</para> - - <table> - <title>MBeans, their names and description</title> - - <tgroup cols='4'> - <thead> - <row> - <entry>MBean</entry> - <entry>MBean Object Name</entry> - <entry>Description</entry> - </row> - </thead> - <tbody> - <row> - <entry>Quorum</entry> - <entry>ReplicatedServer_id<#></entry> - <entry>Represents the Quorum, or Ensemble - parent of all - cluster members. Note that the object name includes the - "myid" of the server (name suffix) that your JMX agent has - connected to.</entry> - </row> - <row> - <entry>LocalPeer|RemotePeer</entry> - <entry>replica.<#></entry> - <entry>Represents a local or remote peer (ie server - participating in the ensemble). Note that the object name - includes the "myid" of the server (name suffix).</entry> - </row> - <row> - <entry>LeaderElection</entry> - <entry>LeaderElection</entry> - <entry>Represents a ZooKeeper cluster leader election which is - in progress. Provides information about the election, such as - when it started.</entry> - </row> - <row> - <entry>Leader</entry> - <entry>Leader</entry> - <entry>Indicates that the parent replica is the leader and - provides attributes/operations for that server. Note that - Leader is a subclass of ZooKeeperServer, so it provides - all of the information normally associated with a - ZooKeeperServer node.</entry> - </row> - <row> - <entry>Follower</entry> - <entry>Follower</entry> - <entry>Indicates that the parent replica is a follower and - provides attributes/operations for that server. Note that - Follower is a subclass of ZooKeeperServer, so it provides - all of the information normally associated with a - ZooKeeperServer node.</entry> - </row> - <row> - <entry>DataTree</entry> - <entry>InMemoryDataTree</entry> - <entry>Statistics on the in memory znode database, also - operations to access finer (and more computationally - intensive) statistics on the data (such as ephemeral - count). InMemoryDataTrees are children of ZooKeeperServer - nodes.</entry> - </row> - <row> - <entry>ServerCnxn</entry> - <entry><session_id></entry> - <entry>Statistics on each client connection, also - operations on those connections (such as - termination). Note the object name is the session id of - the connection in hex form.</entry> - </row> - </tbody></tgroup></table> - - <para>This table details JMX for a standalone server. Typically - standalone is only used in development situations.</para> - - <table> - <title>MBeans, their names and description</title> - - <tgroup cols='4'> - <thead> - <row> - <entry>MBean</entry> - <entry>MBean Object Name</entry> - <entry>Description</entry> - </row> - </thead> - <tbody> - <row> - <entry>ZooKeeperServer</entry> - <entry>StandaloneServer_port<#></entry> - <entry>Statistics on the running server, also operations - to reset these attributes. Note that the object name - includes the client port of the server (name - suffix).</entry> - </row> - <row> - <entry>DataTree</entry> - <entry>InMemoryDataTree</entry> - <entry>Statistics on the in memory znode database, also - operations to access finer (and more computationally - intensive) statistics on the data (such as ephemeral - count).</entry> - </row> - <row> - <entry>ServerCnxn</entry> - <entry><session_id></entry> - <entry>Statistics on each client connection, also - operations on those connections (such as - termination). Note the object name is the session id of - the connection in hex form.</entry> - </row> - </tbody></tgroup></table> - - </section> - -</article> http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/src/docs/src/documentation/content/xdocs/zookeeperObservers.xml ---------------------------------------------------------------------- diff --git a/src/docs/src/documentation/content/xdocs/zookeeperObservers.xml b/src/docs/src/documentation/content/xdocs/zookeeperObservers.xml deleted file mode 100644 index fab6769..0000000 --- a/src/docs/src/documentation/content/xdocs/zookeeperObservers.xml +++ /dev/null @@ -1,145 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- - Copyright 2002-2004 The Apache Software Foundation - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> - -<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" -"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> -<article id="bk_GettStartedGuide"> - <title>ZooKeeper Observers</title> - - <articleinfo> - <legalnotice> - <para>Licensed under the Apache License, Version 2.0 (the "License"); you - may not use this file except in compliance with the License. You may - obtain a copy of the License - at <ulink url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para> - - <para>Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, WITHOUT - WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the - License for the specific language governing permissions and limitations - under the License.</para> - </legalnotice> - - <abstract> - <para>This guide contains information about using non-voting servers, or - observers in your ZooKeeper ensembles.</para> - </abstract> - </articleinfo> - - <section id="ch_Introduction"> - <title>Observers: Scaling ZooKeeper Without Hurting Write Performance - </title> - <para> - Although ZooKeeper performs very well by having clients connect directly - to voting members of the ensemble, this architecture makes it hard to - scale out to huge numbers of clients. The problem is that as we add more - voting members, the write performance drops. This is due to the fact that - a write operation requires the agreement of (in general) at least half the - nodes in an ensemble and therefore the cost of a vote can increase - significantly as more voters are added. - </para> - <para> - We have introduced a new type of ZooKeeper node called - an <emphasis>Observer</emphasis> which helps address this problem and - further improves ZooKeeper's scalability. Observers are non-voting members - of an ensemble which only hear the results of votes, not the agreement - protocol that leads up to them. Other than this simple distinction, - Observers function exactly the same as Followers - clients may connect to - them and send read and write requests to them. Observers forward these - requests to the Leader like Followers do, but they then simply wait to - hear the result of the vote. Because of this, we can increase the number - of Observers as much as we like without harming the performance of votes. - </para> - <para> - Observers have other advantages. Because they do not vote, they are not a - critical part of the ZooKeeper ensemble. Therefore they can fail, or be - disconnected from the cluster, without harming the availability of the - ZooKeeper service. The benefit to the user is that Observers may connect - over less reliable network links than Followers. In fact, Observers may be - used to talk to a ZooKeeper server from another data center. Clients of - the Observer will see fast reads, as all reads are served locally, and - writes result in minimal network traffic as the number of messages - required in the absence of the vote protocol is smaller. - </para> - </section> - <section id="sc_UsingObservers"> - <title>How to use Observers</title> - <para>Setting up a ZooKeeper ensemble that uses Observers is very simple, - and requires just two changes to your config files. Firstly, in the config - file of every node that is to be an Observer, you must place this line: - </para> - <programlisting> - peerType=observer - </programlisting> - - <para> - This line tells ZooKeeper that the server is to be an Observer. Secondly, - in every server config file, you must add :observer to the server - definition line of each Observer. For example: - </para> - - <programlisting> - server.1:localhost:2181:3181:observer - </programlisting> - - <para> - This tells every other server that server.1 is an Observer, and that they - should not expect it to vote. This is all the configuration you need to do - to add an Observer to your ZooKeeper cluster. Now you can connect to it as - though it were an ordinary Follower. Try it out, by running:</para> - <programlisting> - $ bin/zkCli.sh -server localhost:2181 - </programlisting> - <para> - where localhost:2181 is the hostname and port number of the Observer as - specified in every config file. You should see a command line prompt - through which you can issue commands like <emphasis>ls</emphasis> to query - the ZooKeeper service. - </para> - </section> - - <section id="ch_UseCases"> - <title>Example use cases</title> - <para> - Two example use cases for Observers are listed below. In fact, wherever - you wish to scale the number of clients of your ZooKeeper ensemble, or - where you wish to insulate the critical part of an ensemble from the load - of dealing with client requests, Observers are a good architectural - choice. - </para> - <itemizedlist> - <listitem> - <para> As a datacenter bridge: Forming a ZK ensemble between two - datacenters is a problematic endeavour as the high variance in latency - between the datacenters could lead to false positive failure detection - and partitioning. However if the ensemble runs entirely in one - datacenter, and the second datacenter runs only Observers, partitions - aren't problematic as the ensemble remains connected. Clients of the - Observers may still see and issue proposals.</para> - </listitem> - <listitem> - <para>As a link to a message bus: Some companies have expressed an - interest in using ZK as a component of a persistent reliable message - bus. Observers would give a natural integration point for this work: a - plug-in mechanism could be used to attach the stream of proposals an - Observer sees to a publish-subscribe system, again without loading the - core ensemble. - </para> - </listitem> - </itemizedlist> - </section> -</article> http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/src/docs/src/documentation/content/xdocs/zookeeperOtherInfo.xml ---------------------------------------------------------------------- diff --git a/src/docs/src/documentation/content/xdocs/zookeeperOtherInfo.xml b/src/docs/src/documentation/content/xdocs/zookeeperOtherInfo.xml deleted file mode 100644 index a2445b1..0000000 --- a/src/docs/src/documentation/content/xdocs/zookeeperOtherInfo.xml +++ /dev/null @@ -1,46 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- - Copyright 2002-2004 The Apache Software Foundation - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> - -<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" -"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> -<article id="bk_OtherInfo"> - <title>ZooKeeper</title> - - <articleinfo> - <legalnotice> - <para>Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. You may - obtain a copy of the License at <ulink - url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para> - - <para>Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an "AS IS" - BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied. See the License for the specific language governing permissions - and limitations under the License.</para> - </legalnotice> - - <abstract> - <para> currently empty </para> - </abstract> - </articleinfo> - - <section id="ch_placeholder"> - <title>Other Info</title> - <para> currently empty </para> - </section> -</article> http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/src/docs/src/documentation/content/xdocs/zookeeperOver.xml ---------------------------------------------------------------------- diff --git a/src/docs/src/documentation/content/xdocs/zookeeperOver.xml b/src/docs/src/documentation/content/xdocs/zookeeperOver.xml deleted file mode 100644 index f972657..0000000 --- a/src/docs/src/documentation/content/xdocs/zookeeperOver.xml +++ /dev/null @@ -1,464 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- - Copyright 2002-2004 The Apache Software Foundation - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> - -<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" -"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> -<article id="bk_Overview"> - <title>ZooKeeper</title> - - <articleinfo> - <legalnotice> - <para>Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. You may - obtain a copy of the License at <ulink - url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para> - - <para>Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an "AS IS" - BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied. See the License for the specific language governing permissions - and limitations under the License.</para> - </legalnotice> - - <abstract> - <para>This document contains overview information about ZooKeeper. It - discusses design goals, key concepts, implementation, and - performance.</para> - </abstract> - </articleinfo> - - <section id="ch_DesignOverview"> - <title>ZooKeeper: A Distributed Coordination Service for Distributed - Applications</title> - - <para>ZooKeeper is a distributed, open-source coordination service for - distributed applications. It exposes a simple set of primitives that - distributed applications can build upon to implement higher level services - for synchronization, configuration maintenance, and groups and naming. It - is designed to be easy to program to, and uses a data model styled after - the familiar directory tree structure of file systems. It runs in Java and - has bindings for both Java and C.</para> - - <para>Coordination services are notoriously hard to get right. They are - especially prone to errors such as race conditions and deadlock. The - motivation behind ZooKeeper is to relieve distributed applications the - responsibility of implementing coordination services from scratch.</para> - - <section id="sc_designGoals"> - <title>Design Goals</title> - - <para><emphasis role="bold">ZooKeeper is simple.</emphasis> ZooKeeper - allows distributed processes to coordinate with each other through a - shared hierarchal namespace which is organized similarly to a standard - file system. The name space consists of data registers - called znodes, - in ZooKeeper parlance - and these are similar to files and directories. - Unlike a typical file system, which is designed for storage, ZooKeeper - data is kept in-memory, which means ZooKeeper can achieve high - throughput and low latency numbers.</para> - - <para>The ZooKeeper implementation puts a premium on high performance, - highly available, strictly ordered access. The performance aspects of - ZooKeeper means it can be used in large, distributed systems. The - reliability aspects keep it from being a single point of failure. The - strict ordering means that sophisticated synchronization primitives can - be implemented at the client.</para> - - <para><emphasis role="bold">ZooKeeper is replicated.</emphasis> Like the - distributed processes it coordinates, ZooKeeper itself is intended to be - replicated over a sets of hosts called an ensemble.</para> - - <figure> - <title>ZooKeeper Service</title> - - <mediaobject> - <imageobject> - <imagedata fileref="images/zkservice.jpg" /> - </imageobject> - </mediaobject> - </figure> - - <para>The servers that make up the ZooKeeper service must all know about - each other. They maintain an in-memory image of state, along with a - transaction logs and snapshots in a persistent store. As long as a - majority of the servers are available, the ZooKeeper service will be - available.</para> - - <para>Clients connect to a single ZooKeeper server. The client maintains - a TCP connection through which it sends requests, gets responses, gets - watch events, and sends heart beats. If the TCP connection to the server - breaks, the client will connect to a different server.</para> - - <para><emphasis role="bold">ZooKeeper is ordered.</emphasis> ZooKeeper - stamps each update with a number that reflects the order of all - ZooKeeper transactions. Subsequent operations can use the order to - implement higher-level abstractions, such as synchronization - primitives.</para> - - <para><emphasis role="bold">ZooKeeper is fast.</emphasis> It is - especially fast in "read-dominant" workloads. ZooKeeper applications run - on thousands of machines, and it performs best where reads are more - common than writes, at ratios of around 10:1.</para> - </section> - - <section id="sc_dataModelNameSpace"> - <title>Data model and the hierarchical namespace</title> - - <para>The name space provided by ZooKeeper is much like that of a - standard file system. A name is a sequence of path elements separated by - a slash (/). Every node in ZooKeeper's name space is identified by a - path.</para> - - <figure> - <title>ZooKeeper's Hierarchical Namespace</title> - - <mediaobject> - <imageobject> - <imagedata fileref="images/zknamespace.jpg" /> - </imageobject> - </mediaobject> - </figure> - </section> - - <section> - <title>Nodes and ephemeral nodes</title> - - <para>Unlike standard file systems, each node in a ZooKeeper - namespace can have data associated with it as well as children. It is - like having a file-system that allows a file to also be a directory. - (ZooKeeper was designed to store coordination data: status information, - configuration, location information, etc., so the data stored at each - node is usually small, in the byte to kilobyte range.) We use the term - <emphasis>znode</emphasis> to make it clear that we are talking about - ZooKeeper data nodes.</para> - - <para>Znodes maintain a stat structure that includes version numbers for - data changes, ACL changes, and timestamps, to allow cache validations - and coordinated updates. Each time a znode's data changes, the version - number increases. For instance, whenever a client retrieves data it also - receives the version of the data.</para> - - <para>The data stored at each znode in a namespace is read and written - atomically. Reads get all the data bytes associated with a znode and a - write replaces all the data. Each node has an Access Control List (ACL) - that restricts who can do what.</para> - - <para>ZooKeeper also has the notion of ephemeral nodes. These znodes - exists as long as the session that created the znode is active. When the - session ends the znode is deleted. Ephemeral nodes are useful when you - want to implement <emphasis>[tbd]</emphasis>.</para> - </section> - - <section> - <title>Conditional updates and watches</title> - - <para>ZooKeeper supports the concept of <emphasis>watches</emphasis>. - Clients can set a watch on a znode. A watch will be triggered and - removed when the znode changes. When a watch is triggered, the client - receives a packet saying that the znode has changed. If the - connection between the client and one of the Zoo Keeper servers is - broken, the client will receive a local notification. These can be used - to <emphasis>[tbd]</emphasis>.</para> - </section> - - <section> - <title>Guarantees</title> - - <para>ZooKeeper is very fast and very simple. Since its goal, though, is - to be a basis for the construction of more complicated services, such as - synchronization, it provides a set of guarantees. These are:</para> - - <itemizedlist> - <listitem> - <para>Sequential Consistency - Updates from a client will be applied - in the order that they were sent.</para> - </listitem> - - <listitem> - <para>Atomicity - Updates either succeed or fail. No partial - results.</para> - </listitem> - - <listitem> - <para>Single System Image - A client will see the same view of the - service regardless of the server that it connects to.</para> - </listitem> - </itemizedlist> - - <itemizedlist> - <listitem> - <para>Reliability - Once an update has been applied, it will persist - from that time forward until a client overwrites the update.</para> - </listitem> - </itemizedlist> - - <itemizedlist> - <listitem> - <para>Timeliness - The clients view of the system is guaranteed to - be up-to-date within a certain time bound.</para> - </listitem> - </itemizedlist> - - <para>For more information on these, and how they can be used, see - <emphasis>[tbd]</emphasis></para> - </section> - - <section> - <title>Simple API</title> - - <para>One of the design goals of ZooKeeper is provide a very simple - programming interface. As a result, it supports only these - operations:</para> - - <variablelist> - <varlistentry> - <term>create</term> - - <listitem> - <para>creates a node at a location in the tree</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>delete</term> - - <listitem> - <para>deletes a node</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>exists</term> - - <listitem> - <para>tests if a node exists at a location</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>get data</term> - - <listitem> - <para>reads the data from a node</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>set data</term> - - <listitem> - <para>writes data to a node</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>get children</term> - - <listitem> - <para>retrieves a list of children of a node</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>sync</term> - - <listitem> - <para>waits for data to be propagated</para> - </listitem> - </varlistentry> - </variablelist> - - <para>For a more in-depth discussion on these, and how they can be used - to implement higher level operations, please refer to - <emphasis>[tbd]</emphasis></para> - </section> - - <section> - <title>Implementation</title> - - <para><xref linkend="fg_zkComponents" /> shows the high-level components - of the ZooKeeper service. With the exception of the request processor, - each of - the servers that make up the ZooKeeper service replicates its own copy - of each of the components.</para> - - <figure id="fg_zkComponents"> - <title>ZooKeeper Components</title> - - <mediaobject> - <imageobject> - <imagedata fileref="images/zkcomponents.jpg" /> - </imageobject> - </mediaobject> - </figure> - - <para>The replicated database is an in-memory database containing the - entire data tree. Updates are logged to disk for recoverability, and - writes are serialized to disk before they are applied to the in-memory - database.</para> - - <para>Every ZooKeeper server services clients. Clients connect to - exactly one server to submit irequests. Read requests are serviced from - the local replica of each server database. Requests that change the - state of the service, write requests, are processed by an agreement - protocol.</para> - - <para>As part of the agreement protocol all write requests from clients - are forwarded to a single server, called the - <emphasis>leader</emphasis>. The rest of the ZooKeeper servers, called - <emphasis>followers</emphasis>, receive message proposals from the - leader and agree upon message delivery. The messaging layer takes care - of replacing leaders on failures and syncing followers with - leaders.</para> - - <para>ZooKeeper uses a custom atomic messaging protocol. Since the - messaging layer is atomic, ZooKeeper can guarantee that the local - replicas never diverge. When the leader receives a write request, it - calculates what the state of the system is when the write is to be - applied and transforms this into a transaction that captures this new - state.</para> - </section> - - <section> - <title>Uses</title> - - <para>The programming interface to ZooKeeper is deliberately simple. - With it, however, you can implement higher order operations, such as - synchronizations primitives, group membership, ownership, etc. Some - distributed applications have used it to: <emphasis>[tbd: add uses from - white paper and video presentation.]</emphasis> For more information, see - <emphasis>[tbd]</emphasis></para> - </section> - - <section> - <title>Performance</title> - - <para>ZooKeeper is designed to be highly performant. But is it? The - results of the ZooKeeper's development team at Yahoo! Research indicate - that it is. (See <xref linkend="fg_zkPerfRW" />.) It is especially high - performance in applications where reads outnumber writes, since writes - involve synchronizing the state of all servers. (Reads outnumbering - writes is typically the case for a coordination service.)</para> - - <figure id="fg_zkPerfRW"> - <title>ZooKeeper Throughput as the Read-Write Ratio Varies</title> - - <mediaobject> - <imageobject> - <imagedata fileref="images/zkperfRW-3.2.jpg" /> - </imageobject> - </mediaobject> - </figure> - <para>The figure <xref linkend="fg_zkPerfRW"/> is a throughput - graph of ZooKeeper release 3.2 running on servers with dual 2Ghz - Xeon and two SATA 15K RPM drives. One drive was used as a - dedicated ZooKeeper log device. The snapshots were written to - the OS drive. Write requests were 1K writes and the reads were - 1K reads. "Servers" indicate the size of the ZooKeeper - ensemble, the number of servers that make up the - service. Approximately 30 other servers were used to simulate - the clients. The ZooKeeper ensemble was configured such that - leaders do not allow connections from clients.</para> - - <note><para>In version 3.2 r/w performance improved by ~2x - compared to the <ulink - url="http://zookeeper.apache.org/docs/r3.1.1/zookeeperOver.html#Performance">previous - 3.1 release</ulink>.</para></note> - - <para>Benchmarks also indicate that it is reliable, too. <xref - linkend="fg_zkPerfReliability" /> shows how a deployment responds to - various failures. The events marked in the figure are the - following:</para> - - <orderedlist> - <listitem> - <para>Failure and recovery of a follower</para> - </listitem> - - <listitem> - <para>Failure and recovery of a different follower</para> - </listitem> - - <listitem> - <para>Failure of the leader</para> - </listitem> - - <listitem> - <para>Failure and recovery of two followers</para> - </listitem> - - <listitem> - <para>Failure of another leader</para> - </listitem> - </orderedlist> - </section> - - <section> - <title>Reliability</title> - - <para>To show the behavior of the system over time as - failures are injected we ran a ZooKeeper service made up of - 7 machines. We ran the same saturation benchmark as before, - but this time we kept the write percentage at a constant - 30%, which is a conservative ratio of our expected - workloads. - </para> - <figure id="fg_zkPerfReliability"> - <title>Reliability in the Presence of Errors</title> - <mediaobject> - <imageobject> - <imagedata fileref="images/zkperfreliability.jpg" /> - </imageobject> - </mediaobject> - </figure> - - <para>The are a few important observations from this graph. First, if - followers fail and recover quickly, then ZooKeeper is able to sustain a - high throughput despite the failure. But maybe more importantly, the - leader election algorithm allows for the system to recover fast enough - to prevent throughput from dropping substantially. In our observations, - ZooKeeper takes less than 200ms to elect a new leader. Third, as - followers recover, ZooKeeper is able to raise throughput again once they - start processing requests.</para> - </section> - - <section> - <title>The ZooKeeper Project</title> - - <para>ZooKeeper has been - <ulink url="https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy"> - successfully used - </ulink> - in many industrial applications. It is used at Yahoo! as the - coordination and failure recovery service for Yahoo! Message - Broker, which is a highly scalable publish-subscribe system - managing thousands of topics for replication and data - delivery. It is used by the Fetching Service for Yahoo! - crawler, where it also manages failure recovery. A number of - Yahoo! advertising systems also use ZooKeeper to implement - reliable services. - </para> - - <para>All users and developers are encouraged to join the - community and contribute their expertise. See the - <ulink url="http://zookeeper.apache.org/"> - Zookeeper Project on Apache - </ulink> - for more information. - </para> - </section> - </section> -</article>
