http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml ---------------------------------------------------------------------- diff --git a/zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml b/zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml new file mode 100644 index 0000000..a41eb01 --- /dev/null +++ b/zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml @@ -0,0 +1,2312 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Copyright 2002-2004 The Apache Software Foundation + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" +"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> +<article id="bk_Admin"> + <title>ZooKeeper Administrator's Guide</title> + + <subtitle>A Guide to Deployment and Administration</subtitle> + + <articleinfo> + <legalnotice> + <para>Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. You may + obtain a copy of the License at <ulink + url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para> + + <para>Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an "AS IS" + BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied. See the License for the specific language governing permissions + and limitations under the License.</para> + </legalnotice> + + <abstract> + <para>This document contains information about deploying, administering + and mantaining ZooKeeper. It also discusses best practices and common + problems.</para> + </abstract> + </articleinfo> + + <section id="ch_deployment"> + <title>Deployment</title> + + <para>This section contains information about deploying Zookeeper and + covers these topics:</para> + + <itemizedlist> + <listitem> + <para><xref linkend="sc_systemReq" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_zkMulitServerSetup" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_singleAndDevSetup" /></para> + </listitem> + </itemizedlist> + + <para>The first two sections assume you are interested in installing + ZooKeeper in a production environment such as a datacenter. The final + section covers situations in which you are setting up ZooKeeper on a + limited basis - for evaluation, testing, or development - but not in a + production environment.</para> + + <section id="sc_systemReq"> + <title>System Requirements</title> + + <section id="sc_supportedPlatforms"> + <title>Supported Platforms</title> + + <para>ZooKeeper consists of multiple components. Some components are + supported broadly, and other components are supported only on a smaller + set of platforms.</para> + + <itemizedlist> + <listitem> + <para><emphasis role="bold">Client</emphasis> is the Java client + library, used by applications to connect to a ZooKeeper ensemble. + </para> + </listitem> + <listitem> + <para><emphasis role="bold">Server</emphasis> is the Java server + that runs on the ZooKeeper ensemble nodes.</para> + </listitem> + <listitem> + <para><emphasis role="bold">Native Client</emphasis> is a client + implemented in C, similar to the Java client, used by applications + to connect to a ZooKeeper ensemble.</para> + </listitem> + <listitem> + <para><emphasis role="bold">Contrib</emphasis> refers to multiple + optional add-on components.</para> + </listitem> + </itemizedlist> + + <para>The following matrix describes the level of support committed for + running each component on different operating system platforms.</para> + + <table> + <title>Support Matrix</title> + <tgroup cols="5" align="left" colsep="1" rowsep="1"> + <thead> + <row> + <entry>Operating System</entry> + <entry>Client</entry> + <entry>Server</entry> + <entry>Native Client</entry> + <entry>Contrib</entry> + </row> + </thead> + <tbody> + <row> + <entry>GNU/Linux</entry> + <entry>Development and Production</entry> + <entry>Development and Production</entry> + <entry>Development and Production</entry> + <entry>Development and Production</entry> + </row> + <row> + <entry>Solaris</entry> + <entry>Development and Production</entry> + <entry>Development and Production</entry> + <entry>Not Supported</entry> + <entry>Not Supported</entry> + </row> + <row> + <entry>FreeBSD</entry> + <entry>Development and Production</entry> + <entry>Development and Production</entry> + <entry>Not Supported</entry> + <entry>Not Supported</entry> + </row> + <row> + <entry>Windows</entry> + <entry>Development and Production</entry> + <entry>Development and Production</entry> + <entry>Not Supported</entry> + <entry>Not Supported</entry> + </row> + <row> + <entry>Mac OS X</entry> + <entry>Development Only</entry> + <entry>Development Only</entry> + <entry>Not Supported</entry> + <entry>Not Supported</entry> + </row> + </tbody> + </tgroup> + </table> + + <para>For any operating system not explicitly mentioned as supported in + the matrix, components may or may not work. The ZooKeeper community + will fix obvious bugs that are reported for other platforms, but there + is no full support.</para> + </section> + + <section id="sc_requiredSoftware"> + <title>Required Software </title> + + <para>ZooKeeper runs in Java, release 1.8 or greater (JDK 8 or + greater, FreeBSD support requires openjdk8). It runs as an + <emphasis>ensemble</emphasis> of ZooKeeper servers. Three + ZooKeeper servers is the minimum recommended size for an + ensemble, and we also recommend that they run on separate + machines. At Yahoo!, ZooKeeper is usually deployed on + dedicated RHEL boxes, with dual-core processors, 2GB of RAM, + and 80GB IDE hard drives.</para> + </section> + + </section> + + <section id="sc_zkMulitServerSetup"> + <title>Clustered (Multi-Server) Setup</title> + + <para>For reliable ZooKeeper service, you should deploy ZooKeeper in a + cluster known as an <emphasis>ensemble</emphasis>. As long as a majority + of the ensemble are up, the service will be available. Because Zookeeper + requires a majority, it is best to use an + odd number of machines. For example, with four machines ZooKeeper can + only handle the failure of a single machine; if two machines fail, the + remaining two machines do not constitute a majority. However, with five + machines ZooKeeper can handle the failure of two machines. </para> + <note> + <para> + As mentioned in the + <ulink url="zookeeperStarted.html">ZooKeeper Getting Started Guide</ulink> + , a minimum of three servers are required for a fault tolerant + clustered setup, and it is strongly recommended that you have an + odd number of servers. + </para> + <para>Usually three servers is more than enough for a production + install, but for maximum reliability during maintenance, you may + wish to install five servers. With three servers, if you perform + maintenance on one of them, you are vulnerable to a failure on one + of the other two servers during that maintenance. If you have five + of them running, you can take one down for maintenance, and know + that you're still OK if one of the other four suddenly fails. + </para> + <para>Your redundancy considerations should include all aspects of + your environment. If you have three ZooKeeper servers, but their + network cables are all plugged into the same network switch, then + the failure of that switch will take down your entire ensemble. + </para> + </note> + <para>Here are the steps to setting a server that will be part of an + ensemble. These steps should be performed on every host in the + ensemble:</para> + + <orderedlist> + <listitem> + <para>Install the Java JDK. You can use the native packaging system + for your system, or download the JDK from:</para> + + <para><ulink + url="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</ulink></para> + </listitem> + + <listitem> + <para>Set the Java heap size. This is very important to avoid + swapping, which will seriously degrade ZooKeeper performance. To + determine the correct value, use load tests, and make sure you are + well below the usage limit that would cause you to swap. Be + conservative - use a maximum heap size of 3GB for a 4GB + machine.</para> + </listitem> + + <listitem> + <para>Install the ZooKeeper Server Package. It can be downloaded + from: + </para> + <para> + <ulink url="http://zookeeper.apache.org/releases.html"> + http://zookeeper.apache.org/releases.html + </ulink> + </para> + </listitem> + + <listitem> + <para>Create a configuration file. This file can be called anything. + Use the following settings as a starting point:</para> + + <programlisting> +tickTime=2000 +dataDir=/var/lib/zookeeper/ +clientPort=2181 +initLimit=5 +syncLimit=2 +server.1=zoo1:2888:3888 +server.2=zoo2:2888:3888 +server.3=zoo3:2888:3888</programlisting> + + <para>You can find the meanings of these and other configuration + settings in the section <xref linkend="sc_configuration" />. A word + though about a few here:</para> + + <para>Every machine that is part of the ZooKeeper ensemble should know + about every other machine in the ensemble. You accomplish this with + the series of lines of the form <emphasis + role="bold">server.id=host:port:port</emphasis>. The parameters <emphasis + role="bold">host</emphasis> and <emphasis + role="bold">port</emphasis> are straightforward. You attribute the + server id to each machine by creating a file named + <filename>myid</filename>, one for each server, which resides in + that server's data directory, as specified by the configuration file + parameter <emphasis role="bold">dataDir</emphasis>.</para></listitem> + + <listitem><para>The myid file + consists of a single line containing only the text of that machine's + id. So <filename>myid</filename> of server 1 would contain the text + "1" and nothing else. The id must be unique within the + ensemble and should have a value between 1 and 255. <emphasis role="bold">IMPORTANT:</emphasis> if you + enable extended features such as TTL Nodes (see below) the id must be + between 1 and 254 due to internal limitations.</para> + </listitem> + + <listitem> + <para>If your configuration file is set up, you can start a + ZooKeeper server:</para> + + <para><computeroutput>$ java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.17.jar:conf \ + org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg + </computeroutput></para> + + <para>QuorumPeerMain starts a ZooKeeper server, + <ulink url="http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/">JMX</ulink> + management beans are also registered which allows + management through a JMX management console. + The <ulink url="zookeeperJMX.html">ZooKeeper JMX + document</ulink> contains details on managing ZooKeeper with JMX. + </para> + + <para>See the script <emphasis>bin/zkServer.sh</emphasis>, + which is included in the release, for an example + of starting server instances.</para> + + </listitem> + + <listitem> + <para>Test your deployment by connecting to the hosts:</para> + + <para>In Java, you can run the following command to execute + simple operations:</para> + + <para><computeroutput>$ bin/zkCli.sh -server 127.0.0.1:2181</computeroutput></para> + </listitem> + </orderedlist> + </section> + + <section id="sc_singleAndDevSetup"> + <title>Single Server and Developer Setup</title> + + <para>If you want to setup ZooKeeper for development purposes, you will + probably want to setup a single server instance of ZooKeeper, and then + install either the Java or C client-side libraries and bindings on your + development machine.</para> + + <para>The steps to setting up a single server instance are the similar + to the above, except the configuration file is simpler. You can find the + complete instructions in the <ulink + url="zookeeperStarted.html#sc_InstallingSingleMode">Installing and + Running ZooKeeper in Single Server Mode</ulink> section of the <ulink + url="zookeeperStarted.html">ZooKeeper Getting Started + Guide</ulink>.</para> + + <para>For information on installing the client side libraries, refer to + the <ulink url="zookeeperProgrammers.html#Bindings">Bindings</ulink> + section of the <ulink url="zookeeperProgrammers.html">ZooKeeper + Programmer's Guide</ulink>.</para> + </section> + </section> + + <section id="ch_administration"> + <title>Administration</title> + + <para>This section contains information about running and maintaining + ZooKeeper and covers these topics: </para> + <itemizedlist> + <listitem> + <para><xref linkend="sc_designing" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_provisioning" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_strengthsAndLimitations" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_administering" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_maintenance" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_supervision" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_monitoring" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_logging" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_troubleshooting" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_configuration" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_zkCommands" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_dataFileManagement" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_commonProblems" /></para> + </listitem> + + <listitem> + <para><xref linkend="sc_bestPractices" /></para> + </listitem> + </itemizedlist> + + <section id="sc_designing"> + <title>Designing a ZooKeeper Deployment</title> + + <para>The reliablity of ZooKeeper rests on two basic assumptions.</para> + <orderedlist> + <listitem><para> Only a minority of servers in a deployment + will fail. <emphasis>Failure</emphasis> in this context + means a machine crash, or some error in the network that + partitions a server off from the majority.</para> + </listitem> + <listitem><para> Deployed machines operate correctly. To + operate correctly means to execute code correctly, to have + clocks that work properly, and to have storage and network + components that perform consistently.</para> + </listitem> + </orderedlist> + + <para>The sections below contain considerations for ZooKeeper + administrators to maximize the probability for these assumptions + to hold true. Some of these are cross-machines considerations, + and others are things you should consider for each and every + machine in your deployment.</para> + + <section id="sc_CrossMachineRequirements"> + <title>Cross Machine Requirements</title> + + <para>For the ZooKeeper service to be active, there must be a + majority of non-failing machines that can communicate with + each other. To create a deployment that can tolerate the + failure of F machines, you should count on deploying 2xF+1 + machines. Thus, a deployment that consists of three machines + can handle one failure, and a deployment of five machines can + handle two failures. Note that a deployment of six machines + can only handle two failures since three machines is not a + majority. For this reason, ZooKeeper deployments are usually + made up of an odd number of machines.</para> + + <para>To achieve the highest probability of tolerating a failure + you should try to make machine failures independent. For + example, if most of the machines share the same switch, + failure of that switch could cause a correlated failure and + bring down the service. The same holds true of shared power + circuits, cooling systems, etc.</para> + </section> + + <section> + <title>Single Machine Requirements</title> + + <para>If ZooKeeper has to contend with other applications for + access to resources like storage media, CPU, network, or + memory, its performance will suffer markedly. ZooKeeper has + strong durability guarantees, which means it uses storage + media to log changes before the operation responsible for the + change is allowed to complete. You should be aware of this + dependency then, and take great care if you want to ensure + that ZooKeeper operations arenât held up by your media. Here + are some things you can do to minimize that sort of + degradation: + </para> + + <itemizedlist> + <listitem> + <para>ZooKeeper's transaction log must be on a dedicated + device. (A dedicated partition is not enough.) ZooKeeper + writes the log sequentially, without seeking Sharing your + log device with other processes can cause seeks and + contention, which in turn can cause multi-second + delays.</para> + </listitem> + + <listitem> + <para>Do not put ZooKeeper in a situation that can cause a + swap. In order for ZooKeeper to function with any sort of + timeliness, it simply cannot be allowed to swap. + Therefore, make certain that the maximum heap size given + to ZooKeeper is not bigger than the amount of real memory + available to ZooKeeper. For more on this, see + <xref linkend="sc_commonProblems"/> + below. </para> + </listitem> + </itemizedlist> + </section> + </section> + + <section id="sc_provisioning"> + <title>Provisioning</title> + + <para></para> + </section> + + <section id="sc_strengthsAndLimitations"> + <title>Things to Consider: ZooKeeper Strengths and Limitations</title> + + <para></para> + </section> + + <section id="sc_administering"> + <title>Administering</title> + + <para></para> + </section> + + <section id="sc_maintenance"> + <title>Maintenance</title> + + <para>Little long term maintenance is required for a ZooKeeper + cluster however you must be aware of the following:</para> + + <section> + <title>Ongoing Data Directory Cleanup</title> + + <para>The ZooKeeper <ulink url="#var_datadir">Data + Directory</ulink> contains files which are a persistent copy + of the znodes stored by a particular serving ensemble. These + are the snapshot and transactional log files. As changes are + made to the znodes these changes are appended to a + transaction log. Occasionally, when a log grows large, a + snapshot of the current state of all znodes will be written + to the filesystem and a new transaction log file is created + for future transactions. During snapshotting, ZooKeeper may + continue appending incoming transactions to the old log file. + Therefore, some transactions which are newer than a snapshot + may be found in the last transaction log preceding the + snapshot. + </para> + + <para>A ZooKeeper server <emphasis role="bold">will not remove + old snapshots and log files</emphasis> when using the default + configuration (see autopurge below), this is the + responsibility of the operator. Every serving environment is + different and therefore the requirements of managing these + files may differ from install to install (backup for example). + </para> + + <para>The PurgeTxnLog utility implements a simple retention + policy that administrators can use. The <ulink + url="ext:api/index">API docs</ulink> contains details on + calling conventions (arguments, etc...). + </para> + + <para>In the following example the last count snapshots and + their corresponding logs are retained and the others are + deleted. The value of <count> should typically be + greater than 3 (although not required, this provides 3 backups + in the unlikely event a recent log has become corrupted). This + can be run as a cron job on the ZooKeeper server machines to + clean up the logs daily.</para> + + <programlisting> java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.17.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count></programlisting> + + <para>Automatic purging of the snapshots and corresponding + transaction logs was introduced in version 3.4.0 and can be + enabled via the following configuration parameters <emphasis + role="bold">autopurge.snapRetainCount</emphasis> and <emphasis + role="bold">autopurge.purgeInterval</emphasis>. For more on + this, see <xref linkend="sc_advancedConfiguration"/> + below.</para> + </section> + + <section> + <title>Debug Log Cleanup (log4j)</title> + + <para>See the section on <ulink + url="#sc_logging">logging</ulink> in this document. It is + expected that you will setup a rolling file appender using the + in-built log4j feature. The sample configuration file in the + release tar's conf/log4j.properties provides an example of + this. + </para> + </section> + + </section> + + <section id="sc_supervision"> + <title>Supervision</title> + + <para>You will want to have a supervisory process that manages + each of your ZooKeeper server processes (JVM). The ZK server is + designed to be "fail fast" meaning that it will shutdown + (process exit) if an error occurs that it cannot recover + from. As a ZooKeeper serving cluster is highly reliable, this + means that while the server may go down the cluster as a whole + is still active and serving requests. Additionally, as the + cluster is "self healing" the failed server once restarted will + automatically rejoin the ensemble w/o any manual + interaction.</para> + + <para>Having a supervisory process such as <ulink + url="http://cr.yp.to/daemontools.html">daemontools</ulink> or + <ulink + url="http://en.wikipedia.org/wiki/Service_Management_Facility">SMF</ulink> + (other options for supervisory process are also available, it's + up to you which one you would like to use, these are just two + examples) managing your ZooKeeper server ensures that if the + process does exit abnormally it will automatically be restarted + and will quickly rejoin the cluster.</para> + + <para>It is also recommended to configure the ZooKeeper server process to + terminate and dump its heap if an + <computeroutput>OutOfMemoryError</computeroutput> occurs. This is achieved + by launching the JVM with the following arguments on Linux and Windows + respectively. The <filename>zkServer.sh</filename> and + <filename>zkServer.cmd</filename> scripts that ship with ZooKeeper set + these options. + </para> + + <programlisting>-XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -9 %p'</programlisting> + <programlisting>"-XX:+HeapDumpOnOutOfMemoryError" "-XX:OnOutOfMemoryError=cmd /c taskkill /pid %%%%p /t /f"</programlisting> + </section> + + <section id="sc_monitoring"> + <title>Monitoring</title> + + <para>The ZooKeeper service can be monitored in one of two + primary ways; 1) the command port through the use of <ulink + url="#sc_zkCommands">4 letter words</ulink> and 2) <ulink + url="zookeeperJMX.html">JMX</ulink>. See the appropriate section for + your environment/requirements.</para> + </section> + + <section id="sc_logging"> + <title>Logging</title> + + <para> + ZooKeeper uses <emphasis role="bold"><ulink url="http://www.slf4j.org">SLF4J</ulink></emphasis> + version 1.7.5 as its logging infrastructure. For backward compatibility it is bound to + <emphasis role="bold">LOG4J</emphasis> but you can use + <emphasis role="bold"><ulink url="http://logback.qos.ch/">LOGBack</ulink></emphasis> + or any other supported logging framework of your choice. + </para> + <para> + The ZooKeeper default <filename>log4j.properties</filename> + file resides in the <filename>conf</filename> directory. Log4j requires that + <filename>log4j.properties</filename> either be in the working directory + (the directory from which ZooKeeper is run) or be accessible from the classpath. + </para> + + <para>For more information about SLF4J, see + <ulink url="http://www.slf4j.org/manual.html">its manual</ulink>.</para> + + <para>For more information about LOG4J, see + <ulink url="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</ulink> + of the log4j manual.</para> + + </section> + + <section id="sc_troubleshooting"> + <title>Troubleshooting</title> + <variablelist> + <varlistentry> + <term> Server not coming up because of file corruption</term> + <listitem> + <para>A server might not be able to read its database and fail to come up because of + some file corruption in the transaction logs of the ZooKeeper server. You will + see some IOException on loading ZooKeeper database. In such a case, + make sure all the other servers in your ensemble are up and working. Use "stat" + command on the command port to see if they are in good health. After you have verified that + all the other servers of the ensemble are up, you can go ahead and clean the database + of the corrupt server. Delete all the files in datadir/version-2 and datalogdir/version-2/. + Restart the server. + </para> + </listitem> + </varlistentry> + </variablelist> + </section> + + <section id="sc_configuration"> + <title>Configuration Parameters</title> + + <para>ZooKeeper's behavior is governed by the ZooKeeper configuration + file. This file is designed so that the exact same file can be used by + all the servers that make up a ZooKeeper server assuming the disk + layouts are the same. If servers use different configuration files, care + must be taken to ensure that the list of servers in all of the different + configuration files match.</para> + + <note> + <para>In 3.5.0 and later, some of these parameters should be placed in + a dynamic configuration file. If they are placed in the static + configuration file, ZooKeeper will automatically move them over to the + dynamic configuration file. See <ulink url="zookeeperReconfig.html"> + Dynamic Reconfiguration</ulink> for more information.</para> + </note> + + <section id="sc_minimumConfiguration"> + <title>Minimum Configuration</title> + + <para>Here are the minimum configuration keywords that must be defined + in the configuration file:</para> + + <variablelist> + <varlistentry> + <term>clientPort</term> + + <listitem> + <para>the port to listen for client connections; that is, the + port that clients attempt to connect to.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>secureClientPort</term> + + <listitem> + <para>the port to listen on for secure client connections using SSL. + + <emphasis role="bold">clientPort</emphasis> specifies + the port for plaintext connections while <emphasis role="bold"> + secureClientPort</emphasis> specifies the port for SSL + connections. Specifying both enables mixed-mode while omitting + either will disable that mode.</para> + <para>Note that SSL feature will be enabled when user plugs-in + zookeeper.serverCnxnFactory, zookeeper.clientCnxnSocket as Netty.</para> + </listitem> + </varlistentry> + + <varlistentry id="var_datadir"> + <term>dataDir</term> + + <listitem> + <para>the location where ZooKeeper will store the in-memory + database snapshots and, unless specified otherwise, the + transaction log of updates to the database.</para> + + <note> + <para>Be careful where you put the transaction log. A + dedicated transaction log device is key to consistent good + performance. Putting the log on a busy device will adversely + effect performance.</para> + </note> + </listitem> + </varlistentry> + + <varlistentry id="id_tickTime"> + <term>tickTime</term> + + <listitem> + <para>the length of a single tick, which is the basic time unit + used by ZooKeeper, as measured in milliseconds. It is used to + regulate heartbeats, and timeouts. For example, the minimum + session timeout will be two ticks.</para> + </listitem> + </varlistentry> + </variablelist> + </section> + + <section id="sc_advancedConfiguration"> + <title>Advanced Configuration</title> + + <para>The configuration settings in the section are optional. You can + use them to further fine tune the behaviour of your ZooKeeper servers. + Some can also be set using Java system properties, generally of the + form <emphasis>zookeeper.keyword</emphasis>. The exact system + property, when available, is noted below.</para> + + <variablelist> + <varlistentry> + <term>dataLogDir</term> + + <listitem> + <para>(No Java system property)</para> + + <para>This option will direct the machine to write the + transaction log to the <emphasis + role="bold">dataLogDir</emphasis> rather than the <emphasis + role="bold">dataDir</emphasis>. This allows a dedicated log + device to be used, and helps avoid competition between logging + and snaphots.</para> + + <note> + <para>Having a dedicated log device has a large impact on + throughput and stable latencies. It is highly recommened to + dedicate a log device and set <emphasis + role="bold">dataLogDir</emphasis> to point to a directory on + that device, and then make sure to point <emphasis + role="bold">dataDir</emphasis> to a directory + <emphasis>not</emphasis> residing on that device.</para> + </note> + </listitem> + </varlistentry> + + <varlistentry> + <term>globalOutstandingLimit</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.globalOutstandingLimit.</emphasis>)</para> + + <para>Clients can submit requests faster than ZooKeeper can + process them, especially if there are a lot of clients. To + prevent ZooKeeper from running out of memory due to queued + requests, ZooKeeper will throttle clients so that there is no + more than globalOutstandingLimit outstanding requests in the + system. The default limit is 1,000.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>preAllocSize</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.preAllocSize</emphasis>)</para> + + <para>To avoid seeks ZooKeeper allocates space in the + transaction log file in blocks of preAllocSize kilobytes. The + default block size is 64M. One reason for changing the size of + the blocks is to reduce the block size if snapshots are taken + more often. (Also, see <emphasis + role="bold">snapCount</emphasis>).</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>snapCount</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.snapCount</emphasis>)</para> + + <para>ZooKeeper records its transactions using snapshots and + a transaction log (think write-ahead log).The number of + transactions recorded in the transaction log before a snapshot + can be taken (and the transaction log rolled) is determined + by snapCount. In order to prevent all of the machines in the quorum + from taking a snapshot at the same time, each ZooKeeper server + will take a snapshot when the number of transactions in the transaction log + reaches a runtime generated random value in the [snapCount/2+1, snapCount] + range.The default snapCount is 100,000.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>maxClientCnxns</term> + <listitem> + <para>(No Java system property)</para> + + <para>Limits the number of concurrent connections (at the socket + level) that a single client, identified by IP address, may make + to a single member of the ZooKeeper ensemble. This is used to + prevent certain classes of DoS attacks, including file + descriptor exhaustion. The default is 60. Setting this to 0 + entirely removes the limit on concurrent connections.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>clientPortAddress</term> + + <listitem> + <para><emphasis role="bold">New in 3.3.0:</emphasis> the + address (ipv4, ipv6 or hostname) to listen for client + connections; that is, the address that clients attempt + to connect to. This is optional, by default we bind in + such a way that any connection to the <emphasis + role="bold">clientPort</emphasis> for any + address/interface/nic on the server will be + accepted.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>minSessionTimeout</term> + <listitem> + <para>(No Java system property)</para> + + <para><emphasis role="bold">New in 3.3.0:</emphasis> the + minimum session timeout in milliseconds that the server + will allow the client to negotiate. Defaults to 2 times + the <emphasis role="bold">tickTime</emphasis>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>maxSessionTimeout</term> + <listitem> + <para>(No Java system property)</para> + + <para><emphasis role="bold">New in 3.3.0:</emphasis> the + maximum session timeout in milliseconds that the server + will allow the client to negotiate. Defaults to 20 times + the <emphasis role="bold">tickTime</emphasis>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>fsync.warningthresholdms</term> + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.fsync.warningthresholdms</emphasis>)</para> + + <para><emphasis role="bold">New in 3.3.4:</emphasis> A + warning message will be output to the log whenever an + fsync in the Transactional Log (WAL) takes longer than + this value. The values is specified in milliseconds and + defaults to 1000. This value can only be set as a + system property.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>autopurge.snapRetainCount</term> + + <listitem> + <para>(No Java system property)</para> + + <para><emphasis role="bold">New in 3.4.0:</emphasis> + When enabled, ZooKeeper auto purge feature retains + the <emphasis role="bold">autopurge.snapRetainCount</emphasis> most + recent snapshots and the corresponding transaction logs in the + <emphasis role="bold">dataDir</emphasis> and <emphasis + role="bold">dataLogDir</emphasis> respectively and deletes the rest. + Defaults to 3. Minimum value is 3.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>autopurge.purgeInterval</term> + + <listitem> + <para>(No Java system property)</para> + + <para><emphasis role="bold">New in 3.4.0:</emphasis> The + time interval in hours for which the purge task has to + be triggered. Set to a positive integer (1 and above) + to enable the auto purging. Defaults to 0.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>syncEnabled</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.observer.syncEnabled</emphasis>)</para> + + <para><emphasis role="bold">New in 3.4.6, 3.5.0:</emphasis> + The observers now log transaction and write snapshot to disk + by default like the participants. This reduces the recovery time + of the observers on restart. Set to "false" to disable this + feature. Default is "true"</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>zookeeper.extendedTypesEnabled</term> + + <listitem> + <para>(Java system property only: <emphasis + role="bold">zookeeper.extendedTypesEnabled</emphasis>)</para> + + <para><emphasis role="bold">New in 3.5.4, 3.6.0:</emphasis> Define to "true" to enable + extended features such as the creation of <ulink url="zookeeperProgrammers.html#TTL+Nodes">TTL Nodes</ulink>. + They are disabled by default. IMPORTANT: when enabled server IDs must + be less than 255 due to internal limitations. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>zookeeper.emulate353TTLNodes</term> + + <listitem> + <para>(Java system property only: <emphasis + role="bold">zookeeper.emulate353TTLNodes</emphasis>)</para> + + <para><emphasis role="bold">New in 3.5.4, 3.6.0:</emphasis> Due to + <ulink url="https://issues.apache.org/jira/browse/ZOOKEEPER-2901">ZOOKEEPER-2901</ulink> TTL nodes + created in version 3.5.3 are not supported in 3.5.4/3.6.0. However, a workaround is provided via the + zookeeper.emulate353TTLNodes system property. If you used TTL nodes in ZooKeeper 3.5.3 and need to maintain + compatibility set <emphasis role="bold">zookeeper.emulate353TTLNodes</emphasis> to "true" in addition to + <emphasis role="bold">zookeeper.extendedTypesEnabled</emphasis>. NOTE: due to the bug, server IDs + must be 127 or less. Additionally, the maximum support TTL value is 1099511627775 which is smaller + than what was allowed in 3.5.3 (1152921504606846975)</para> + </listitem> + </varlistentry> + + </variablelist> + </section> + + <section id="sc_clusterOptions"> + <title>Cluster Options</title> + + <para>The options in this section are designed for use with an ensemble + of servers -- that is, when deploying clusters of servers.</para> + + <variablelist> + <varlistentry> + <term>electionAlg</term> + + <listitem> + <para>(No Java system property)</para> + + <para>Election implementation to use. A value of "0" corresponds + to the original UDP-based version, "1" corresponds to the + non-authenticated UDP-based version of fast leader election, "2" + corresponds to the authenticated UDP-based version of fast + leader election, and "3" corresponds to TCP-based version of + fast leader election. Currently, algorithm 3 is the default</para> + + <note> + <para> The implementations of leader election 0, 1, and 2 are now + <emphasis role="bold"> deprecated </emphasis>. We have the intention + of removing them in the next release, at which point only the + FastLeaderElection will be available. + </para> + </note> + </listitem> + </varlistentry> + + <varlistentry> + <term>initLimit</term> + + <listitem> + <para>(No Java system property)</para> + + <para>Amount of time, in ticks (see <ulink + url="#id_tickTime">tickTime</ulink>), to allow followers to + connect and sync to a leader. Increased this value as needed, if + the amount of data managed by ZooKeeper is large.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>leaderServes</term> + + <listitem> + <para>(Java system property: zookeeper.<emphasis + role="bold">leaderServes</emphasis>)</para> + + <para>Leader accepts client connections. Default value is "yes". + The leader machine coordinates updates. For higher update + throughput at thes slight expense of read throughput the leader + can be configured to not accept clients and focus on + coordination. The default to this option is yes, which means + that a leader will accept client connections.</para> + + <note> + <para>Turning on leader selection is highly recommended when + you have more than three ZooKeeper servers in an ensemble.</para> + </note> + </listitem> + </varlistentry> + + <varlistentry> + <term>server.x=[hostname]:nnnnn[:nnnnn], etc</term> + + <listitem> + <para>(No Java system property)</para> + + <para>servers making up the ZooKeeper ensemble. When the server + starts up, it determines which server it is by looking for the + file <filename>myid</filename> in the data directory. That file + contains the server number, in ASCII, and it should match + <emphasis role="bold">x</emphasis> in <emphasis + role="bold">server.x</emphasis> in the left hand side of this + setting.</para> + + <para>The list of servers that make up ZooKeeper servers that is + used by the clients must match the list of ZooKeeper servers + that each ZooKeeper server has.</para> + + <para>There are two port numbers <emphasis role="bold">nnnnn</emphasis>. + The first followers use to connect to the leader, and the second is for + leader election. The leader election port is only necessary if electionAlg + is 1, 2, or 3 (default). If electionAlg is 0, then the second port is not + necessary. If you want to test multiple servers on a single machine, then + different ports can be used for each server.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>syncLimit</term> + + <listitem> + <para>(No Java system property)</para> + + <para>Amount of time, in ticks (see <ulink + url="#id_tickTime">tickTime</ulink>), to allow followers to sync + with ZooKeeper. If followers fall too far behind a leader, they + will be dropped.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>group.x=nnnnn[:nnnnn]</term> + + <listitem> + <para>(No Java system property)</para> + + <para>Enables a hierarchical quorum construction."x" is a group identifier + and the numbers following the "=" sign correspond to server identifiers. + The left-hand side of the assignment is a colon-separated list of server + identifiers. Note that groups must be disjoint and the union of all groups + must be the ZooKeeper ensemble. </para> + + <para> You will find an example <ulink url="zookeeperHierarchicalQuorums.html">here</ulink> + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>weight.x=nnnnn</term> + + <listitem> + <para>(No Java system property)</para> + + <para>Used along with "group", it assigns a weight to a server when + forming quorums. Such a value corresponds to the weight of a server + when voting. There are a few parts of ZooKeeper that require voting + such as leader election and the atomic broadcast protocol. By default + the weight of server is 1. If the configuration defines groups, but not + weights, then a value of 1 will be assigned to all servers. + </para> + + <para> You will find an example <ulink url="zookeeperHierarchicalQuorums.html">here</ulink> + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>cnxTimeout</term> + + <listitem> + <para>(Java system property: zookeeper.<emphasis + role="bold">cnxTimeout</emphasis>)</para> + + <para>Sets the timeout value for opening connections for leader election notifications. + Only applicable if you are using electionAlg 3. + </para> + + <note> + <para>Default value is 5 seconds.</para> + </note> + </listitem> + </varlistentry> + + <varlistentry> + <term>standaloneEnabled</term> + + <listitem> + <para>(No Java system property)</para> + + <para><emphasis role="bold">New in 3.5.0:</emphasis> + When set to false, a single server can be started in replicated + mode, a lone participant can run with observers, and a cluster + can reconfigure down to one node, and up from one node. The + default is true for backwards compatibility. It can be set + using QuorumPeerConfig's setStandaloneEnabled method or by + adding "standaloneEnabled=false" or "standaloneEnabled=true" + to a server's config file. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>reconfigEnabled</term> + + <listitem> + <para>(No Java system property)</para> + + <para><emphasis role="bold">New in 3.5.3:</emphasis> + This controls the enabling or disabling of + <ulink url="zookeeperReconfig.html"> + Dynamic Reconfiguration</ulink> feature. When the feature + is enabled, users can perform reconfigure operations through + the ZooKeeper client API or through ZooKeeper command line tools + assuming users are authorized to perform such operations. + When the feature is disabled, no user, including the super user, + can perform a reconfiguration. Any attempt to reconfigure will return an error. + <emphasis role="bold">"reconfigEnabled"</emphasis> option can be set as + <emphasis role="bold">"reconfigEnabled=false"</emphasis> or + <emphasis role="bold">"reconfigEnabled=true"</emphasis> + to a server's config file, or using QuorumPeerConfig's + setReconfigEnabled method. The default value is false. + + If present, the value should be consistent across every server in + the entire ensemble. Setting the value as true on some servers and false + on other servers will cause inconsistent behavior depending on which server + is elected as leader. If the leader has a setting of + <emphasis role="bold">"reconfigEnabled=true"</emphasis>, then the ensemble + will have reconfig feature enabled. If the leader has a setting of + <emphasis role="bold">"reconfigEnabled=false"</emphasis>, then the ensemble + will have reconfig feature disabled. It is thus recommended to have a consistent + value for <emphasis role="bold">"reconfigEnabled"</emphasis> across servers + in the ensemble. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>4lw.commands.whitelist</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.4lw.commands.whitelist</emphasis>)</para> + + <para><emphasis role="bold">New in 3.5.3:</emphasis> + A list of comma separated <ulink url="#sc_4lw">Four Letter Words</ulink> + commands that user wants to use. A valid Four Letter Words + command must be put in this list else ZooKeeper server will + not enable the command. + By default the whitelist only contains "srvr" command + which zkServer.sh uses. The rest of four letter word commands are disabled + by default. + </para> + + <para>Here's an example of the configuration that enables stat, ruok, conf, and isro + command while disabling the rest of Four Letter Words command:</para> + <programlisting> + 4lw.commands.whitelist=stat, ruok, conf, isro + </programlisting> + + <para>If you really need enable all four letter word commands by default, you can use + the asterisk option so you don't have to include every command one by one in the list. + As an example, this will enable all four letter word commands: + </para> + <programlisting> + 4lw.commands.whitelist=* + </programlisting> + + </listitem> + </varlistentry> + + <varlistentry> + <term>tcpKeepAlive</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.tcpKeepAlive</emphasis>)</para> + + <para><emphasis role="bold">New in 3.5.4:</emphasis> + Setting this to true sets the TCP keepAlive flag on the + sockets used by quorum members to perform elections. + This will allow for connections between quorum members to + remain up when there is network infrastructure that may + otherwise break them. Some NATs and firewalls may terminate + or lose state for long running or idle connections.</para> + + <para> Enabling this option relies on OS level settings to work + properly, check your operating system's options regarding TCP + keepalive for more information. Defaults to + <emphasis role="bold">false</emphasis>. + </para> + </listitem> + </varlistentry> + + </variablelist> + <para></para> + </section> + + <section id="sc_authOptions"> + <title>Encryption, Authentication, Authorization Options</title> + + <para>The options in this section allow control over + encryption/authentication/authorization performed by the service.</para> + + <variablelist> + <varlistentry> + <term>DigestAuthenticationProvider.superDigest</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.DigestAuthenticationProvider.superDigest</emphasis>)</para> + + <para>By default this feature is <emphasis + role="bold">disabled</emphasis></para> + + <para><emphasis role="bold">New in 3.2:</emphasis> + Enables a ZooKeeper ensemble administrator to access the + znode hierarchy as a "super" user. In particular no ACL + checking occurs for a user authenticated as + super.</para> + + <para>org.apache.zookeeper.server.auth.DigestAuthenticationProvider + can be used to generate the superDigest, call it with + one parameter of "super:<password>". Provide the + generated "super:<data>" as the system property value + when starting each server of the ensemble.</para> + + <para>When authenticating to a ZooKeeper server (from a + ZooKeeper client) pass a scheme of "digest" and authdata + of "super:<password>". Note that digest auth passes + the authdata in plaintext to the server, it would be + prudent to use this authentication method only on + localhost (not over the network) or over an encrypted + connection.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>X509AuthenticationProvider.superUser</term> + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.X509AuthenticationProvider.superUser</emphasis>)</para> + + <para>The SSL-backed way to enable a ZooKeeper ensemble + administrator to access the znode hierarchy as a "super" user. + When this parameter is set to an X500 principal name, only an + authenticated client with that principal will be able to bypass + ACL checking and have full privileges to all znodes.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>zookeeper.superUser</term> + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.superUser</emphasis>)</para> + + <para>Similar to <emphasis role="bold">zookeeper.X509AuthenticationProvider.superUser</emphasis> + but is generic for SASL based logins. It stores the name of + a user that can access the znode hierarchy as a "super" user. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>ssl.keyStore.location and ssl.keyStore.password</term> + <listitem> + <para>(Java system properties: <emphasis role="bold"> + zookeeper.ssl.keyStore.location</emphasis> and <emphasis + role="bold">zookeeper.ssl.keyStore.password</emphasis>)</para> + + <para>Specifies the file path to a JKS containing the local + credentials to be used for SSL connections, and the + password to unlock the file.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>ssl.trustStore.location and ssl.trustStore.password</term> + <listitem> + <para>(Java system properties: <emphasis role="bold"> + zookeeper.ssl.trustStore.location</emphasis> and <emphasis + role="bold">zookeeper.ssl.trustStore.password</emphasis>)</para> + + <para>Specifies the file path to a JKS containing the remote + credentials to be used for SSL connections, and the + password to unlock the file.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>ssl.authProvider</term> + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.ssl.authProvider</emphasis>)</para> + + <para>Specifies a subclass of <emphasis role="bold"> + org.apache.zookeeper.auth.X509AuthenticationProvider</emphasis> + to use for secure client authentication. This is useful in + certificate key infrastructures that do not use JKS. It may be + necessary to extend <emphasis role="bold">javax.net.ssl.X509KeyManager + </emphasis> and <emphasis role="bold">javax.net.ssl.X509TrustManager</emphasis> + to get the desired behavior from the SSL stack. To configure the + ZooKeeper server to use the custom provider for authentication, + choose a scheme name for the custom AuthenticationProvider and + set the property <emphasis role="bold">zookeeper.authProvider.[scheme] + </emphasis> to the fully-qualified class name of the custom + implementation. This will load the provider into the ProviderRegistry. + Then set this property <emphasis role="bold"> + zookeeper.ssl.authProvider=[scheme]</emphasis> and that provider + will be used for secure authentication.</para> + </listitem> + </varlistentry> + </variablelist> + </section> + + <section> + <title>Experimental Options/Features</title> + + <para>New features that are currently considered experimental.</para> + + <variablelist> + <varlistentry> + <term>Read Only Mode Server</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">readonlymode.enabled</emphasis>)</para> + + <para><emphasis role="bold">New in 3.4.0:</emphasis> + Setting this value to true enables Read Only Mode server + support (disabled by default). ROM allows clients + sessions which requested ROM support to connect to the + server even when the server might be partitioned from + the quorum. In this mode ROM clients can still read + values from the ZK service, but will be unable to write + values and see changes from other clients. See + ZOOKEEPER-784 for more details. + </para> + </listitem> + </varlistentry> + + </variablelist> + </section> + + <section> + <title>Unsafe Options</title> + + <para>The following options can be useful, but be careful when you use + them. The risk of each is explained along with the explanation of what + the variable does.</para> + + <variablelist> + <varlistentry> + <term>forceSync</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.forceSync</emphasis>)</para> + + <para>Requires updates to be synced to media of the transaction + log before finishing processing the update. If this option is + set to no, ZooKeeper will not require updates to be synced to + the media.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>jute.maxbuffer:</term> + + <listitem> + <para>(Java system property:<emphasis role="bold"> + jute.maxbuffer</emphasis>)</para> + + <para>This option can only be set as a Java system property. + There is no zookeeper prefix on it. It specifies the maximum + size of the data that can be stored in a znode. The default is + 0xfffff, or just under 1M. If this option is changed, the system + property must be set on all servers and clients otherwise + problems will arise. This is really a sanity check. ZooKeeper is + designed to store data on the order of kilobytes in size.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>skipACL</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.skipACL</emphasis>)</para> + + <para>Skips ACL checks. This results in a boost in throughput, + but opens up full access to the data tree to everyone.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>quorumListenOnAllIPs</term> + + <listitem> + <para>When set to true the ZooKeeper server will listen + for connections from its peers on all available IP addresses, + and not only the address configured in the server list of the + configuration file. It affects the connections handling the + ZAB protocol and the Fast Leader Election protocol. Default + value is <emphasis role="bold">false</emphasis>.</para> + </listitem> + </varlistentry> + + </variablelist> + </section> + + <section> + <title>Disabling data directory autocreation</title> + + <para><emphasis role="bold">New in 3.5:</emphasis> The default + behavior of a ZooKeeper server is to automatically create the + data directory (specified in the configuration file) when + started if that directory does not already exist. This can be + inconvenient and even dangerous in some cases. Take the case + where a configuration change is made to a running server, + wherein the <emphasis role="bold">dataDir</emphasis> parameter + is accidentally changed. When the ZooKeeper server is + restarted it will create this non-existent directory and begin + serving - with an empty znode namespace. This scenario can + result in an effective "split brain" situation (i.e. data in + both the new invalid directory and the original valid data + store). As such is would be good to have an option to turn off + this autocreate behavior. In general for production + environments this should be done, unfortunately however the + default legacy behavior cannot be changed at this point and + therefore this must be done on a case by case basis. This is + left to users and to packagers of ZooKeeper distributions. + </para> + + <para>When running <emphasis + role="bold">zkServer.sh</emphasis> autocreate can be disabled + by setting the environment variable <emphasis + role="bold">ZOO_DATADIR_AUTOCREATE_DISABLE</emphasis> to 1. + When running ZooKeeper servers directly from class files this + can be accomplished by setting <emphasis + role="bold">zookeeper.datadir.autocreate=false</emphasis> on + the java command line, i.e. <emphasis + role="bold">-Dzookeeper.datadir.autocreate=false</emphasis> + </para> + + <para>When this feature is disabled, and the ZooKeeper server + determines that the required directories do not exist it will + generate an error and refuse to start. + </para> + + <para>A new script <emphasis + role="bold">zkServer-initialize.sh</emphasis> is provided to + support this new feature. If autocreate is disabled it is + necessary for the user to first install ZooKeeper, then create + the data directory (and potentially txnlog directory), and + then start the server. Otherwise as mentioned in the previous + paragraph the server will not start. Running <emphasis + role="bold">zkServer-initialize.sh</emphasis> will create the + required directories, and optionally setup the myid file + (optional command line parameter). This script can be used + even if the autocreate feature itself is not used, and will + likely be of use to users as this (setup, including creation + of the myid file) has been an issue for users in the past. + Note that this script ensures the data directories exist only, + it does not create a config file, but rather requires a config + file to be available in order to execute. + </para> + </section> + + <section id="sc_performance_options"> + <title>Performance Tuning Options</title> + + <para><emphasis role="bold">New in 3.5.0:</emphasis> Several subsystems have been reworked + to improve read throughput. This includes multi-threading of the NIO communication subsystem and + request processing pipeline (Commit Processor). NIO is the default client/server communication + subsystem. Its threading model comprises 1 acceptor thread, 1-N selector threads and 0-M + socket I/O worker threads. In the request processing pipeline the system can be configured + to process multiple read request at once while maintaining the same consistency guarantee + (same-session read-after-write). The Commit Processor threading model comprises 1 main + thread and 0-N worker threads. + </para> + + <para> + The default values are aimed at maximizing read throughput on a dedicated ZooKeeper machine. + Both subsystems need to have sufficient amount of threads to achieve peak read throughput. + </para> + + <variablelist> + + <varlistentry> + <term>zookeeper.nio.numSelectorThreads</term> + <listitem> + <para>(Java system property only: <emphasis + role="bold">zookeeper.nio.numSelectorThreads</emphasis>) + </para> + <para><emphasis role="bold">New in 3.5.0:</emphasis> + Number of NIO selector threads. At least 1 selector thread required. + It is recommended to use more than one selector for large numbers + of client connections. The default value is sqrt( number of cpu cores / 2 ). + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>zookeeper.nio.numWorkerThreads</term> + <listitem> + <para>(Java system property only: <emphasis + role="bold">zookeeper.nio.numWorkerThreads</emphasis>) + </para> + <para><emphasis role="bold">New in 3.5.0:</emphasis> + Number of NIO worker threads. If configured with 0 worker threads, the selector threads + do the socket I/O directly. The default value is 2 times the number of cpu cores. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>zookeeper.commitProcessor.numWorkerThreads</term> + <listitem> + <para>(Java system property only: <emphasis + role="bold">zookeeper.commitProcessor.numWorkerThreads</emphasis>) + </para> + <para><emphasis role="bold">New in 3.5.0:</emphasis> + Number of Commit Processor worker threads. If configured with 0 worker threads, the main thread + will process the request directly. The default value is the number of cpu cores. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>znode.container.checkIntervalMs</term> + + <listitem> + <para>(Java system property only)</para> + + <para><emphasis role="bold">New in 3.5.1:</emphasis> The + time interval in milliseconds for each check of candidate container + and ttl nodes. Default is "60000".</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>znode.container.maxPerMinute</term> + + <listitem> + <para>(Java system property only)</para> + + <para><emphasis role="bold">New in 3.5.1:</emphasis> The + maximum number of container nodes that can be deleted per + minute. This prevents herding during container deletion. + Default is "10000".</para> + </listitem> + </varlistentry> + </variablelist> + </section> + + <section> + <title>Communication using the Netty framework</title> + + <para><ulink url="http://netty.io">Netty</ulink> + is an NIO based client/server communication framework, it + simplifies (over NIO being used directly) many of the + complexities of network level communication for java + applications. Additionally the Netty framework has built + in support for encryption (SSL) and authentication + (certificates). These are optional features and can be + turned on or off individually. + </para> + <para>In versions 3.5+, a ZooKeeper server can use Netty + instead of NIO (default option) by setting the environment + variable <emphasis role="bold">zookeeper.serverCnxnFactory</emphasis> + to <emphasis role="bold">org.apache.zookeeper.server.NettyServerCnxnFactory</emphasis>; + for the client, set <emphasis role="bold">zookeeper.clientCnxnSocket</emphasis> + to <emphasis role="bold">org.apache.zookeeper.ClientCnxnSocketNetty</emphasis>. + </para> + + <para> + TBD - tuning options for netty - currently there are none that are netty specific but we should add some. Esp around max bound on the number of reader worker threads netty creates. + </para> + <para> + TBD - how to manage encryption + </para> + <para> + TBD - how to manage certificates + </para> + + </section> + + <section id="sc_adminserver_config"> + <title>AdminServer configuration</title> + <para><emphasis role="bold">New in 3.5.0:</emphasis> The following + options are used to configure the <ulink + url="#sc_adminserver">AdminServer</ulink>.</para> + + <variablelist> + <varlistentry> + <term>admin.enableServer</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.admin.enableServer</emphasis>)</para> + + <para>Set to "false" to disable the AdminServer. By default the + AdminServer is enabled.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>admin.serverAddress</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.admin.serverAddress</emphasis>)</para> + + <para>The address the embedded Jetty server listens on. Defaults to 0.0.0.0.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>admin.serverPort</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.admin.serverPort</emphasis>)</para> + + <para>The port the embedded Jetty server listens on. Defaults to 8080.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>admin.idleTimeout</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.admin.idleTimeout</emphasis>)</para> + + <para>Set the maximum idle time in milliseconds that a connection can wait + before sending or receiving data. Defaults to 30000 ms.</para> + </listitem> + </varlistentry> + + + <varlistentry> + <term>admin.commandURL</term> + + <listitem> + <para>(Java system property: <emphasis + role="bold">zookeeper.admin.commandURL</emphasis>)</para> + + <para>The URL for listing and issuing commands relative to the + root URL. Defaults to "/commands".</para> + </listitem> + </varlistentry> + </variablelist> + </section> + + </section> + + <section id="sc_zkCommands"> + <title>ZooKeeper Commands</title> + + <section id="sc_4lw"> + <title>The Four Letter Words</title> + <para>ZooKeeper responds to a small set of commands. Each command is + composed of four letters. You issue the commands to ZooKeeper via telnet + or nc, at the client port.</para> + + <para>Three of the more interesting commands: "stat" gives some + general information about the server and connected clients, + while "srvr" and "cons" give extended details on server and + connections respectively.</para> + + <para><emphasis role="bold">New in 3.5.3:</emphasis> + Four Letter Words need to be explicitly white listed before using. + Please refer <emphasis role="bold">4lw.commands.whitelist</emphasis> + described in <ulink url="#sc_clusterOptions"> + cluster configuration section</ulink> for details. + Moving forward, Four Letter Words will be deprecated, please use + <ulink url="#sc_adminserver">AdminServer</ulink> instead. + </para> + + <variablelist> + <varlistentry> + <term>conf</term> + + <listitem> + <para><emphasis role="bold">New in 3.3.0:</emphasis> Print + details about serving configuration.</para> + </listitem> + + </varlistentry> + + <varlistentry> + <term>cons</term> + + <listitem> + <para><emphasis role="bold">New in 3.3.0:</emphasis> List + full connection/session details for all clients connected + to this server. Includes information on numbers of packets + received/sent, session id, operation latencies, last + operation performed, etc...</para> + </listitem> + + </varlistentry> + + <varlistentry> + <term>crst</term> + + <listitem> + <para><emphasis role="bold">New in 3.3.0:</emphasis> Reset + connection/session statistics for all connections.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>dump</term> + + <listitem> + <para>Lists the outstanding sessions and ephemeral nodes. This + only works on the leader.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>envi</term> + + <listitem> + <para>Print details about serving environment</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>ruok</term> + + <listitem> + <para>Tests if server is running in a non-error state. The server + will respond with imok if it is running. Otherwise it will not + respond at all.</para> + + <para>A response of "imok" does not necessarily indicate that the + server has joined the quorum, just that the server process is active + and bound to the specified client port. Use "stat" for details on + state wrt quorum and client connection information.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>srst</term> + + <listitem> + <para>Reset server statistics.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>srvr</term> + + <listitem> + <para><emphasis role="bold">New in 3.3.0:</emphasis> Lists + full details for the server.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>stat</term> + + <listitem> + <para>Lists brief details for the server and connected + clients.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>wchs</term> + + <listitem> + <para><emphasis role="bold">New in 3.3.0:</emphasis> Lists + brief information on watches for the server.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>wchc</term> + + <listitem> + <para><emphasis role="bold">New in 3.3.0:</emphasis> Lists + detailed information on watches for the server, by + session. This outputs a list of sessions(connections) + with associated watches (paths). Note, depending on the + number of watches this operation may be expensive (ie + impact server performance), use it carefully.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>dirs</term> + + <listitem> + <para><emphasis role="bold">New in 3.5.1:</emphasis> + Shows the total size of snapshot and log files in bytes + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>wchp</term> + + <listitem> + <para><emphasis role="bold">New in 3.3.0:</emphasis> Lists + detailed information on watches for the server, by path. + This outputs a list of paths (znodes) with associated + sessions. Note, depending on the number of watches this + operation may be expensive (ie impact server performance), + use it carefully.</para> + </listitem> + </varlistentry> + + + <varlistentry> + <term>mntr</term> + + <listitem> + <para><emphasis role="bold">New in 3.4.0:</emphasis> Outputs a list + of variables that could be used for monitoring the health of the cluster.</para> + + <programlisting>$ echo mntr | nc localhost 2185 + + zk_version 3.4.0 + zk_avg_latency 0 + zk_max_latency 0 + zk_min_latency 0 + zk_packets_received 70 + zk_packets_sent 69 + zk_num_alive_connections 1 + zk_outstanding_requests 0 + zk_server_state leader + zk_znode_count 4 + zk_watch_count 0 + zk_ephemerals_count 0 + zk_approximate_data_size 27 + zk_followers 4 - only exposed by the Leader + zk_synced_followers 4 - only exposed by the Leader + zk_pending_syncs 0 - only exposed by the Leader + zk_open_file_descriptor_count 23 - only available on Unix platforms + zk_max_file_descriptor_count 1024 - only available on Unix platforms + </programlisting> + + <para>The output is compatible with java properties format and the content + may change over time (new keys added). Your scripts should expect changes.</para> + + <para>ATTENTION: Some of the keys are platform specific and some of the keys are only exported by the Leader. </para> + + <para>The output contains multiple lines with the following format:</para> + <programlisting>key \t value</programlisting> + </listitem> + </varlistentry> + + <varlistentry> + <term>isro</term> + + <listitem> + <para><emphasis role="bold">New in 3.4.0:</emphasis> Tests if + server is running in read-only mode. The server will respond with + "ro" if in read-only mode or "rw" if not in read-only mode.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>gtmk</term> + + <listitem> + <para>Gets the current trace mask as a 64-bit signed long value in + decimal format. See <command>stmk</command> for an explanation of + the possible values.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>stmk</term> + + <listitem> + <para>Sets the current trace mask. The trace mask is 64 bits, + where each bit enables or disables a specific category of trace + logging on the server. Log4J must be configured to enable + <command>TRACE</command> level first in order to see trace logging + messages. The bits of the trace mask correspond to the following + trace logging categories.</para> + + <table> + <title>Trace Mask Bit Values</title> + <tgroup cols="2" align="left" colsep="1" rowsep="1"> + <tbody> + <row> + <entry>0b0000000000</entry> + <entry>Unused, reserved for future use.</entry> + </row> + <row> + <entry>0b0000000010</entry> + <entry>Logs client requests, excluding ping + requests.</entry> + </row> + <row> + <entry>0b0000000100</entry> + <entry>Unused, reserved for future use.</entry> + </row> + <row> + <entry>0b0000001000</entry> + <entry>Logs client ping requests.</entry> + </row> + <row> + <entry>0b0000010000</entry> + <entry>Logs packets received from the quorum peer that is + the current leader, excluding ping requests.</entry> + </row> + <row> + <entry>0b0000100000</entry> + <entry>Logs addition, removal and validation of client + sessions.</entry> + </row> + <row> + <entry>0b0001000000</entry> + <entry>Logs delivery of watch events to client + sessions.</entry> + </row> + <row> + <entry>0b0010000000</entry> + <entry>Logs ping packets received from the quorum peer + that is the current leader.</entry> + </row> + <row> + <entry>0b0100000000</entry> + <entry>Unused, reserved for future use.</entry> + </row> + <row> + <entry>0b1000000000</entry> + <entry>Unused, reserved for future use.</entry> + </row> + </tbody> + </tgroup> + </table> + + <para>All remaining bits in the 64-bit value are unused and + reserved for future use. Multiple trace logging categories are + specified by calculating the bitwise OR of the documented values. + The default trace mask is 0b0100110010. Thus, by default, trace + logging includes client requests, packets received from the + leader and sessions.</para> + + <para>To set a different trace mask, send a request containing the + <command>stmk</command> four-letter word followed by the trace + mask represented as a 64-bit signed long value. This example uses + the Perl <command>pack</command> function to construct a trace + mask that enables all trace logging categories described above and + convert it to a 64-bit signed long value with big-endian byte + order. The result is appended to <command>stmk</command> and sent + to the server using netcat. The server responds with the new + trace mask in decimal format.</para> + + <programlisting>$ perl -e "print 'stmk', pack('q>', 0b0011111010)" | nc localhost 2181 +250 + </programlisting> + </listitem> + </varlistentry> + </variablelist> + + <para>Here's an example of the <emphasis role="bold">ruok</emphasis> + command:</para> + + <programlisting>$ echo ruok | nc 127.0.0.1 5111 + imok + </programlisting> + + </section> + <section id="sc_adminserver"> + <title>The AdminServer</title> + <para><emphasis role="bold">New in 3.5.0: </emphasis>The AdminServer is + an embedded Jetty server that provides an HTTP interface to the four +
<TRUNCATED>
