[2/5] accumulo-website git commit: Updates to installation instructions

mwalch Fri, 02 Jun 2017 07:39:55 -0700

Updates to installation instructions

* Added text in INSTALL.md of tarball distribution to new "Quick Install"
  page under getting-started dropdown.
* Renamed installation page in "administration" dropdown to "In-depth
  Installation"
* Made them both reference each other.



Project: http://git-wip-us.apache.org/repos/asf/accumulo-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo-website/commit/3c554918
Tree: http://git-wip-us.apache.org/repos/asf/accumulo-website/tree/3c554918
Diff: http://git-wip-us.apache.org/repos/asf/accumulo-website/diff/3c554918

Branch: refs/heads/master
Commit: 3c554918dd354a8342a68cecc55e8918986b5611
Parents: a33b3ed
Author: Mike Walch <[email protected]>
Authored: Fri Jun 2 10:26:37 2017 -0400
Committer: Mike Walch <[email protected]>
Committed: Fri Jun 2 10:33:10 2017 -0400

----------------------------------------------------------------------
 .../administration/in-depth-install.md          | 723 +++++++++++++++++++
 _docs-unreleased/administration/installation.md | 719 ------------------
 _docs-unreleased/getting-started/clients.md     |   2 +-
 .../getting-started/quick-install.md            | 186 +++++
 _docs-unreleased/getting-started/shell.md       |   2 +-
 .../getting-started/table_configuration.md      |   2 +-
 .../getting-started/table_design.md             |   2 +-
 7 files changed, 913 insertions(+), 723 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/3c554918/_docs-unreleased/administration/in-depth-install.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/administration/in-depth-install.md 
b/_docs-unreleased/administration/in-depth-install.md
new file mode 100644
index 0000000..8719036
--- /dev/null
+++ b/_docs-unreleased/administration/in-depth-install.md
@@ -0,0 +1,723 @@
+---
+title: In-depth Installation
+category: administration
+order: 1
+---
+
+This document provides detailed instructions for installing Accumulo. For basic
+instructions, see the [quick installation guide][quick].
+
+## Hardware
+
+Because we are running essentially two or three systems simultaneously layered
+across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to
+consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process 
can have
+at least one core and 2 - 4 GB each.
+
+One core running HDFS can typically keep 2 to 4 disks busy, so each machine may
+typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB 
disks.
+
+It is possible to do with less than this, such as with 1u servers with 2 cores 
and 4GB
+each, but in this case it is recommended to only run up to two processes per
+machine -- i.e. DataNode and TabletServer or DataNode and MapReduce worker but
+not all three. The constraint here is having enough available heap space for 
all the
+processes on a machine.
+
+## Network
+
+Accumulo communicates via remote procedure calls over TCP/IP for both passing
+data and control messages. In addition, Accumulo uses HDFS clients to
+communicate with HDFS. To achieve good ingest and query performance, sufficient
+network bandwidth must be available between any two machines.
+
+In addition to needing access to ports associated with HDFS and ZooKeeper, 
Accumulo will
+use the following default ports. Please make sure that they are open, or change
+their value in accumulo-site.xml.
+
+|Port | Description | Property Name
+|-----|-------------|--------------
+|4445 | Shutdown Port (Accumulo MiniCluster) | n/a
+|4560 | Accumulo monitor (for centralized log display) | monitor.port.log4j
+|9995 | Accumulo HTTP monitor | monitor.port.client
+|9997 | Tablet Server | tserver.port.client
+|9998 | Accumulo GC | gc.port.client
+|9999 | Master Server | master.port.client
+|12234 | Accumulo Tracer | trace.port.client
+|42424 | Accumulo Proxy Server | n/a
+|10001 | Master Replication service | master.replication.coordinator.port
+|10002 | TabletServer Replication service | replication.receipt.service.port
+
+In addition, the user can provide `0` and an ephemeral port will be chosen 
instead. This
+ephemeral port is likely to be unique and not already bound. Thus, configuring 
ports to
+use `0` instead of an explicit value, should, in most cases, work around any 
issues of
+running multiple distinct Accumulo instances (or any other process which tries 
to use the
+same default ports) on the same hardware. Finally, the *.port.client 
properties will work
+with the port range syntax (M-N) allowing the user to specify a range of ports 
for the
+service to attempt to bind. The ports in the range will be tried in a 1-up 
manner starting
+at the low end of the range to, and including, the high end of the range.
+
+## Download Tarball
+
+Download a binary distribution of Accumulo and install it to a directory on a 
disk with
+sufficient space:
+
+    cd <install directory>
+    tar xzf accumulo-X.Y.Z-bin.tar.gz   # Replace 'X.Y.Z' with your Accumulo 
version
+    cd accumulo-X.Y.Z
+
+Repeat this step on each machine in your cluster. Typically, the same 
`<install directory>`
+is chosen for all machines in the cluster.
+
+There are four scripts in the `bin/` directory that are used to manage 
Accumulo:
+
+1. `accumulo` - Runs Accumulo command-line tools and starts Accumulo processes
+2. `accumulo-service` - Runs Accumulo processes as services
+3. `accumulo-cluster` - Manages Accumulo cluster on a single node or several 
nodes
+4. `accumulo-util` - Accumulo utilities for creating configuration, native 
libraries, etc.
+
+These scripts will be used in the remaining instructions to configure and run 
Accumulo.
+
+## Dependencies
+
+Accumulo requires HDFS and ZooKeeper to be configured and running
+before starting. Password-less SSH should be configured between at least the
+Accumulo master and TabletServer machines. It is also a good idea to run 
Network
+Time Protocol (NTP) within the cluster to ensure nodes' clocks don't get too 
out of
+sync, which can cause problems with automatically timestamped data.
+
+## Configuration
+
+The Accumulo tarball contains a `conf/` directory where Accumulo looks for 
configuration. If you
+installed Accumulo using downstream packaging, the `conf/` could be something 
else like
+`/etc/accumulo/`.
+
+Before starting Accumulo, the configuration files `accumulo-env.sh` and 
`accumulo-site.xml` must
+exist in `conf/` and be properly configured. If you are using 
`accumulo-cluster` to launch
+a cluster, the `conf/` directory must also contain hosts file for Accumulo 
services (i.e `gc`,
+`masters`, `monitor`, `tservers`, `tracers`). You can either create these 
files manually or run
+`accumulo-cluster create-config`.
+
+Logging is configured in `accumulo-env.sh` to use three log4j configuration 
files in `conf/`. The
+file used depends on the Accumulo command or service being run. Logging for 
most Accumulo services
+(i.e Master, TabletServer, Garbage Collector) is configured by 
`log4j-service.properties` except for
+the Monitor which is configured by `log4j-monitor.properties`. All Accumulo 
commands (i.e `init`,
+`shell`, etc) are configured by `log4j.properties`.
+
+### Configure accumulo-env.sh
+
+Accumulo needs to know where to find the software it depends on. Edit 
accumulo-env.sh
+and specify the following:
+
+1. Enter the location of Hadoop for `$HADOOP_PREFIX`
+2. Enter the location of ZooKeeper for `$ZOOKEEPER_HOME`
+3. Optionally, choose a different location for Accumulo logs using 
`$ACCUMULO_LOG_DIR`
+
+Accumulo uses `HADOOP_PREFIX` and `ZOOKEEPER_HOME` to locate Hadoop and 
Zookeeper jars
+and add them the `CLASSPATH` variable. If you are running a vendor-specific 
release of Hadoop
+or Zookeeper, you may need to change how your `CLASSPATH` is built in 
`accumulo-env.sh`. If
+Accumulo has problems later on finding jars, run `accumulo classpath -d` to 
debug and print
+Accumulo's classpath.
+
+You may want to change the default memory settings for Accumulo's TabletServer 
which are
+by set in the `JAVA_OPTS` settings for 'tservers' in `accumulo-env.sh`. Note 
the
+syntax is that of the Java JVM command line options. This value should be less 
than the
+physical memory of the machines running TabletServers.
+
+There are similar options for the master's memory usage and the garbage 
collector
+process. Reduce these if they exceed the physical RAM of your hardware and
+increase them, within the bounds of the physical RAM, if a process fails 
because of
+insufficient memory.
+
+Note that you will be specifying the Java heap space in accumulo-env.sh. You 
should
+make sure that the total heap space used for the Accumulo tserver and the 
Hadoop
+DataNode and TaskTracker is less than the available memory on each worker node 
in
+the cluster. On large clusters, it is recommended that the Accumulo master, 
Hadoop
+NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate
+machines to allow them to use more heap space. If you are running these on the
+same machine on a small cluster, likewise make sure their heap space settings 
fit
+within the available memory.
+
+### Native Map
+
+The tablet server uses a data structure called a MemTable to store sorted 
key/value
+pairs in memory when they are first received from the client. When a minor 
compaction
+occurs, this data structure is written to HDFS. The MemTable will default to 
using
+memory in the JVM but a JNI version, called the native map, can be used to 
significantly
+speed up performance by utilizing the memory space of the native operating 
system. The
+native map also avoids the performance implications brought on by garbage 
collection
+in the JVM by causing it to pause much less frequently.
+
+#### Building
+
+32-bit and 64-bit Linux and Mac OS X versions of the native map can be built 
by executing
+`accumulo-util build-native`. If your system's default compiler options are 
insufficient,
+you can add additional compiler options to the command line, such as options 
for the
+architecture. These will be passed to the Makefile in the environment variable 
`USERFLAGS`.
+
+Examples:
+
+    accumulo-util build-native
+    accumulo-util build-native -m32
+
+After building the native map from the source, you will find the artifact in
+`lib/native`. Upon starting up, the tablet server will look
+in this directory for the map library. If the file is renamed or moved from its
+target directory, the tablet server may not be able to find it. The system can
+also locate the native maps shared library by setting `LD_LIBRARY_PATH`
+(or `DYLD_LIBRARY_PATH` on Mac OS X) in `accumulo-env.sh`.
+
+#### Native Maps Configuration
+
+As mentioned, Accumulo will use the native libraries if they are found in the 
expected
+location and `tserver.memory.maps.native.enabled` is set to `true` (which is 
the default).
+Using the native maps over JVM Maps nets a noticeable improvement in ingest 
rates; however,
+certain configuration variables are important to modify when increasing the 
size of the
+native map.
+
+To adjust the size of the native map, increase the value of 
`tserver.memory.maps.max`.
+By default, the maximum size of the native map is 1GB. When increasing this 
value, it is
+also important to adjust the values of `table.compaction.minor.logs.threshold` 
and
+`tserver.walog.max.size`. `table.compaction.minor.logs.threshold` is the 
maximum
+number of write-ahead log files that a tablet can reference before they will 
be automatically
+minor compacted. `tserver.walog.max.size` is the maximum size of a write-ahead 
log.
+
+The maximum size of the native maps for a server should be less than the 
product
+of the write-ahead log maximum size and minor compaction threshold for log 
files:
+
+`$table.compaction.minor.logs.threshold * $tserver.walog.max.size >= 
$tserver.memory.maps.max`
+
+This formula ensures that minor compactions won't be automatically triggered 
before the native
+maps can be completely saturated.
+
+Subsequently, when increasing the size of the write-ahead logs, it can also be 
important
+to increase the HDFS block size that Accumulo uses when creating the files for 
the write-ahead log.
+This is controlled via `tserver.wal.blocksize`. A basic recommendation is that 
when
+`tserver.walog.max.size` is larger than 2GB in size, set 
`tserver.wal.blocksize` to 2GB.
+Increasing the block size to a value larger than 2GB can result in decreased 
write
+performance to the write-ahead log file which will slow ingest.
+
+### Cluster Specification
+
+If you are using `accumulo-cluster` to start a cluster, configure the 
following on the
+machine that will serve as the Accumulo master:
+
+1. Write the IP address or domain name of the Accumulo Master to the 
`conf/masters` file.
+2. Write the IP addresses or domain name of the machines that will be 
TabletServers in `conf/tservers`, one per line.
+
+Note that if using domain names rather than IP addresses, DNS must be 
configured
+properly for all machines participating in the cluster. DNS can be a confusing 
source
+of errors.
+
+### Configure accumulo-site.xml
+
+Specify appropriate values for the following settings in `accumulo-site.xml`:
+
+```xml
+<property>
+    <name>instance.zookeeper.host</name>
+    <value>zooserver-one:2181,zooserver-two:2181</value>
+    <description>list of zookeeper servers</description>
+</property>
+```
+
+This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate
+settings between processes and helps finalize TabletServer failure.
+
+```xml
+<property>
+    <name>instance.secret</name>
+    <value>DEFAULT</value>
+</property>
+```
+
+The instance needs a secret to enable secure communication between servers. 
Configure your
+secret and make sure that the `accumulo-site.xml` file is not readable to 
other users.
+For alternatives to storing the `instance.secret` in plaintext, please read the
+`Sensitive Configuration Values` section.
+
+Some settings can be modified via the Accumulo shell and take effect 
immediately, but
+some settings require a process restart to take effect. See the [configuration 
management][config-mgmt]
+documentation for details.
+
+### Hostnames in configuration files
+
+Accumulo has a number of configuration files which can contain references to 
other hosts in your
+network. All of the "host" configuration files for Accumulo (`gc`, `masters`, 
`tservers`, `monitor`,
+`tracers`) as well as `instance.volumes` in accumulo-site.xml must contain 
some host reference.
+
+While IP address, short hostnames, or fully qualified domain names (FQDN) are 
all technically valid, it
+is good practice to always use FQDNs for both Accumulo and other processes in 
your Hadoop cluster.
+Failing to consistently use FQDNs can have unexpected consequences in how 
Accumulo uses the FileSystem.
+
+A common way for this problem can be observed is via applications that use 
Bulk Ingest. The Accumulo
+Master coordinates moving the input files to Bulk Ingest to an 
Accumulo-managed directory. However,
+Accumulo cannot safely move files across different Hadoop FileSystems. This is 
problematic because
+Accumulo also cannot make reliable assertions across what is the same 
FileSystem which is specified
+with different names. Naively, while 127.0.0.1:8020 might be a valid 
identifier for an HDFS instance,
+Accumulo identifies `localhost:8020` as a different HDFS instance than 
`127.0.0.1:8020`.
+
+### Deploy Configuration
+
+Copy accumulo-env.sh and accumulo-site.xml from the `conf/` directory on the 
master to all Accumulo
+tablet servers.  The "host" configuration files files `accumulo-cluster` only 
need to be on servers
+where that command is run.
+
+### Sensitive Configuration Values
+
+Accumulo has a number of properties that can be specified via the 
accumulo-site.xml
+file which are sensitive in nature, instance.secret and 
trace.token.property.password
+are two common examples. Both of these properties, if compromised, have the 
ability
+to result in data being leaked to users who should not have access to that 
data.
+
+In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as 
a common
+implementation to abstract away the storage and retrieval of passwords from 
plaintext
+storage in configuration files. Any Property marked with the `Sensitive` 
annotation
+is a candidate for use with these CredentialProviders. For version of Hadoop 
which lack
+these classes, the feature will just be unavailable for use.
+
+A comma separated list of CredentialProviders can be configured using the 
Accumulo Property
+`general.security.credential.provider.paths`. Each configured URL will be 
consulted
+when the Configuration object for accumulo-site.xml is accessed.
+
+### Using a JavaKeyStoreCredentialProvider for storage
+
+One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore 
CredentialProvider.
+Each entry in the KeyStore is the Accumulo Property key name. For example, to 
store the
+`instance.secret`, the following command can be used:
+
+    hadoop credential create instance.secret --provider 
jceks://file/etc/accumulo/conf/accumulo.jceks
+
+The command will then prompt you to enter the secret to use and create a 
keystore in: 
+
+    /path/to/accumulo/conf/accumulo.jceks
+
+Then, accumulo-site.xml must be configured to use this KeyStore as a 
CredentialProvider:
+
+```xml
+<property>
+    <name>general.security.credential.provider.paths</name>
+    <value>jceks://file/path/to/accumulo/conf/accumulo.jceks</value>
+</property>
+```
+
+This configuration will then transparently extract the `instance.secret` from
+the configured KeyStore and alleviates a human readable storage of the 
sensitive
+property.
+
+A KeyStore can also be stored in HDFS, which will make the KeyStore readily 
available to
+all Accumulo servers. If the local filesystem is used, be aware that each 
Accumulo server
+will expect the KeyStore in the same location.
+
+### Client Configuration
+
+In version 1.6.0, Accumulo included a new type of configuration file known as 
a client
+configuration file. One problem with the traditional "site.xml" file that is 
prevalent
+through Hadoop is that it is a single file used by both clients and servers. 
This makes
+it very difficult to protect secrets that are only meant for the server 
processes while
+allowing the clients to connect to the servers.
+
+The client configuration file is a subset of the information stored in 
accumulo-site.xml
+meant only for consumption by clients of Accumulo. By default, Accumulo checks 
a number
+of locations for a client configuration by default:
+
+* `/path/to/accumulo/conf/client.conf`
+* `/etc/accumulo/client.conf`
+* `/etc/accumulo/conf/client.conf`
+* `~/.accumulo/config`
+
+These files are [Java Properties 
files](https://en.wikipedia.org/wiki/.properties). These files
+can currently contain information about ZooKeeper servers, RPC properties 
(such as SSL or SASL
+connectors), distributed tracing properties. Valid properties are defined by 
the 
[ClientProperty](https://github.com/apache/accumulo/blob/f1d0ec93d9f13ff84844b5ac81e4a7b383ced467/core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java#L54)
+enum contained in the client API.
+
+#### Custom Table Tags
+
+Accumulo has the ability for users to add custom tags to tables.  This allows
+applications to set application-level metadata about a table.  These tags can 
be
+anything from a table description, administrator notes, date created, etc.
+This is done by naming and setting a property with a prefix `table.custom.*`.
+
+Currently, table properties are stored in ZooKeeper. This means that the number
+and size of custom properties should be restricted on the order of 10's of 
properties
+at most without any properties exceeding 1MB in size. ZooKeeper's performance 
can be
+very sensitive to an excessive number of nodes and the sizes of the nodes. 
Applications
+which leverage the user of custom properties should take these warnings into
+consideration. There is no enforcement of these warnings via the API.
+
+#### Configuring the ClassLoader
+
+Accumulo builds its Java classpath in `accumulo-env.sh`.  After an Accumulo 
application has started, it will load classes from the locations
+specified in the deprecated `general.classpaths` property. Additionally, 
Accumulo will load classes from the locations specified in the
+`general.dynamic.classpaths` property and will monitor and reload them if they 
change. The reloading  feature is useful during the development
+and testing of iterators as new or modified iterator classes can be deployed 
to Accumulo without having to restart the database.
+/
+Accumulo also has an alternate configuration for the classloader which will 
allow it to load classes from remote locations. This mechanism
+uses Apache Commons VFS which enables locations such as http and hdfs to be 
used. This alternate configuration also uses the
+`general.classpaths` property in the same manner described above. It differs 
in that you need to configure the
+`general.vfs.classpaths` property instead of the `general.dynamic.classpath` 
property. As in the default configuration, this alternate
+configuration will also monitor the vfs locations for changes and reload if 
necessary.
+
+The Accumulo classpath can be viewed in human readable format by running 
`accumulo classpath -d`.
+
+##### ClassLoader Contexts
+
+With the addition of the VFS based classloader, we introduced the notion of 
classloader contexts. A context is identified
+by a name and references a set of locations from which to load classes and can 
be specified in the accumulo-site.xml file or added
+using the `config` command in the shell. Below is an example for specify the 
app1 context in the accumulo-site.xml file:
+
+```xml
+<property>
+  <name>general.vfs.context.classpath.app1</name>
+  
<value>hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar</value>
+  <description>Application A classpath, loads jars from HDFS and local file 
system</description>
+</property>
+```
+
+The default behavior follows the Java ClassLoader contract in that classes, if 
they exists, are loaded from the parent classloader first.
+You can override this behavior by delegating to the parent classloader after 
looking in this classloader first. An example of this
+configuration is:
+
+```xml
+<property>
+  <name>general.vfs.context.classpath.app1.delegation=post</name>
+  
<value>hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar</value>
+  <description>Application A classpath, loads jars from HDFS and local file 
system</description>
+</property>
+```
+
+To use contexts in your application you can set the `table.classpath.context` 
on your tables or use the `setClassLoaderContext()` method on Scanner
+and BatchScanner passing in the name of the context, app1 in the example 
above. Setting the property on the table allows your minc, majc, and scan 
+iterators to load classes from the locations defined by the context. Passing 
the context name to the scanners allows you to override the table setting
+to load only scan time iterators from a different location. 
+
+## Initialization
+
+Accumulo must be initialized to create the structures it uses internally to 
locate
+data across the cluster. HDFS is required to be configured and running before
+Accumulo can be initialized.
+
+Once HDFS is started, initialization can be performed by executing
+`accumulo init` . This script will prompt for a name
+for this instance of Accumulo. The instance name is used to identify a set of 
tables
+and instance-specific settings. The script will then write some information 
into
+HDFS so Accumulo can start properly.
+
+The initialization script will prompt you to set a root password. Once 
Accumulo is
+initialized it can be started.
+
+## Running
+
+### Starting Accumulo
+
+Make sure Hadoop is configured on all of the machines in the cluster, including
+access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running.
+Make sure ZooKeeper is configured and running on at least one machine in the
+cluster.
+Start Accumulo using `accumulo-cluster start`.
+
+To verify that Accumulo is running, check the [Accumulo monitor][monitor].
+In addition, the Shell can provide some information about the status of tables 
via reading the metadata tables.
+
+### Stopping Accumulo
+
+To shutdown cleanly, run `accumulo-cluster stop` and the master will 
orchestrate the
+shutdown of all the tablet servers. Shutdown waits for all minor compactions 
to finish, so it may
+take some time for particular configurations.
+
+### Adding a Tablet Server
+
+Update your `conf/tservers` file to account for the addition.
+
+Next, ssh to each of the hosts you want to add and run:
+
+    accumulo-service tserver start
+
+Make sure the host in question has the new configuration, or else the tablet
+server won't start; at a minimum this needs to be on the host(s) being added,
+but in practice it's good to ensure consistent configuration across all nodes.
+
+### Decomissioning a Tablet Server
+
+If you need to take a node out of operation, you can trigger a graceful 
shutdown of a tablet
+server. Accumulo will automatically rebalance the tablets across the available 
tablet servers.
+
+    accumulo admin stop <host(s)> {<host> ...}
+
+Alternatively, you can ssh to each of the hosts you want to remove and run:
+
+    accumulo-service tserver stop
+
+Be sure to update your `conf/tservers` file to
+account for the removal of these hosts. Bear in mind that the monitor will not 
re-read the
+tservers file automatically, so it will report the decommissioned servers as 
down; it's
+recommended that you restart the monitor so that the node list is up to date.
+
+The steps described to decommission a node can also be used (without removal 
of the host
+from the `conf/tservers` file) to gracefully stop a node. This will
+ensure that the tabletserver is cleanly stopped and recovery will not need to 
be performed
+when the tablets are re-hosted.
+
+### Restarting process on a node
+
+Occasionally, it might be necessary to restart the processes on a specific 
node. In addition
+to the `accumulo-cluster` script, Accumulo has a `accumulo-service` script that
+can be use to start/stop processes on a node.
+
+#### A note on rolling restarts
+
+For sufficiently large Accumulo clusters, restarting multiple TabletServers 
within a short window can place significant 
+load on the Master server.  If slightly lower availability is acceptable, this 
load can be reduced by globally setting 
+`table.suspend.duration` to a positive value.  
+
+With `table.suspend.duration` set to, say, `5m`, Accumulo will wait 
+for 5 minutes for any dead TabletServer to return before reassigning that 
TabletServer's responsibilities to other TabletServers.
+If the TabletServer returns to the cluster before the specified timeout has 
elapsed, Accumulo will assign the TabletServer 
+its original responsibilities.
+
+It is important not to choose too large a value for `table.suspend.duration`, 
as during this time, all scans against the 
+data that TabletServer had hosted will block (or time out).
+
+### Running multiple TabletServers on a single node
+
+With very powerful nodes, it may be beneficial to run more than one 
TabletServer on a given
+node. This decision should be made carefully and with much deliberation as 
Accumulo is designed
+to be able to scale to using 10's of GB of RAM and 10's of CPU cores.
+
+Accumulo TabletServers bind certain ports on the host to accommodate remote 
procedure calls to/from
+other nodes. Running more than one TabletServer on a host requires that you 
set the environment variable
+`ACCUMULO_SERVICE_INSTANCE` to an instance number (i.e 1, 2) for each instance 
that is started. Also, set
+these properties in `accumulo-site.xml`:
+
+```xml
+  <property>
+    <name>tserver.port.search</name>
+    <value>true</value>
+  </property>
+  <property>
+    <name>replication.receipt.service.port</name>
+    <value>0</value>
+  </property>
+```
+
+## Logging
+
+Accumulo processes each write to a set of log files. By default, these logs 
are found at directory
+set by `ACCUMULO_LOG_DIR` in `accumulo-env.sh`.
+
+## Recovery
+
+In the event of TabletServer failure or error on shutting Accumulo down, some
+mutations may not have been minor compacted to HDFS properly. In this case,
+Accumulo will automatically reapply such mutations from the write-ahead log
+either when the tablets from the failed server are reassigned by the Master 
(in the
+case of a single TabletServer failure) or the next time Accumulo starts (in 
the event of
+failure during shutdown).
+
+Recovery is performed by asking a tablet server to sort the logs so that 
tablets can easily find their missing
+updates. The sort status of each file is displayed on
+Accumulo monitor status page. Once the recovery is complete any
+tablets involved should return to an `online` state. Until then those tablets 
will be
+unavailable to clients.
+
+The Accumulo client library is configured to retry failed mutations and in many
+cases clients will be able to continue processing after the recovery process 
without
+throwing an exception.
+
+## Migrating Accumulo from non-HA Namenode to HA Namenode
+
+The following steps will allow a non-HA instance to be migrated to an HA 
instance. Consider an HDFS URL
+`hdfs://namenode.example.com:8020` which is going to be moved to 
`hdfs://nameservice1`.
+
+Before moving HDFS over to the HA namenode, use `accumulo admin volumes` to 
confirm
+that the only volume displayed is the volume from the current namenode's HDFS 
URL.
+
+    Listing volumes referenced in zookeeper
+            Volume : hdfs://namenode.example.com:8020/accumulo
+
+    Listing volumes referenced in accumulo.root tablets section
+            Volume : hdfs://namenode.example.com:8020/accumulo
+    Listing volumes referenced in accumulo.root deletes section (volume 
replacement occurrs at deletion time)
+
+    Listing volumes referenced in accumulo.metadata tablets section
+            Volume : hdfs://namenode.example.com:8020/accumulo
+
+    Listing volumes referenced in accumulo.metadata deletes section (volume 
replacement occurrs at deletion time)
+
+After verifying the current volume is correct, shut down the cluster and 
transition HDFS to the HA nameservice.
+
+Edit `accumulo-site.xml` to notify accumulo that a volume is being replaced. 
First,
+add the new nameservice volume to the `instance.volumes` property. Next, add 
the
+`instance.volumes.replacements` property in the form of `old new`. It's 
important to not include
+the volume that's being replaced in `instance.volumes`, otherwise it's 
possible accumulo could continue
+to write to the volume.
+
+```xml
+<!-- instance.dfs.uri and instance.dfs.dir should not be set-->
+<property>
+  <name>instance.volumes</name>
+  <value>hdfs://nameservice1/accumulo</value>
+</property>
+<property>
+  <name>instance.volumes.replacements</name>
+  <value>hdfs://namenode.example.com:8020/accumulo 
hdfs://nameservice1/accumulo</value>
+</property>
+```
+
+Run `accumulo init --add-volumes` and start up the accumulo cluster. Verify 
that the
+new nameservice volume shows up with `accumulo admin volumes`.
+
+    Listing volumes referenced in zookeeper
+            Volume : hdfs://namenode.example.com:8020/accumulo
+            Volume : hdfs://nameservice1/accumulo
+
+    Listing volumes referenced in accumulo.root tablets section
+            Volume : hdfs://namenode.example.com:8020/accumulo
+            Volume : hdfs://nameservice1/accumulo
+    Listing volumes referenced in accumulo.root deletes section (volume 
replacement occurrs at deletion time)
+
+    Listing volumes referenced in accumulo.metadata tablets section
+            Volume : hdfs://namenode.example.com:8020/accumulo
+            Volume : hdfs://nameservice1/accumulo
+    Listing volumes referenced in accumulo.metadata deletes section (volume 
replacement occurrs at deletion time)
+
+Some erroneous GarbageCollector messages may still be seen for a small period 
while data is transitioning to
+the new volumes. This is expected and can usually be ignored.
+
+## Achieving Stability in a VM Environment
+
+For testing, demonstration, and even operation uses, Accumulo is often
+installed and run in a virtual machine (VM) environment. The majority of
+long-term operational uses of Accumulo are on bare-metal cluster. However, the
+core design of Accumulo and its dependencies do not preclude running stably for
+long periods within a VM. Many of Accumuloâs operational robustness features 
to
+handle failures like periodic network partitioning in a large cluster carry
+over well to VM environments. This guide covers general recommendations for
+maximizing stability in a VM environment, including some of the common failure
+modes that are more common when running in VMs.
+
+### Known failure modes: Setup and Troubleshooting
+
+In addition to the general failure modes of running Accumulo, VMs can 
introduce a
+couple of environmental challenges that can affect process stability. Clock
+drift is something that is more common in VMs, especially when VMs are
+suspended and resumed. Clock drift can cause Accumulo servers to assume that
+they have lost connectivity to the other Accumulo processes and/or lose their
+locks in Zookeeper. VM environments also frequently have constrained resources,
+such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally
+deals well with constrained resources from a stability perspective (optimizing
+performance will require additional tuning, which is not covered in this
+section), however there are some limits.
+
+#### Physical Memory
+
+One of those limits has to do with the Linux out of memory killer. A common
+failure mode in VM environments (and in some bare metal installations) is when
+the Linux out of memory killer decides to kill processes in order to avoid a
+kernel panic when provisioning a memory page. This often happens in VMs due to
+the large number of processes that must run in a small memory footprint. In
+addition to the Linux core processes, a single-node Accumulo setup requires a
+Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper
+server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer.
+Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop
+ResourceManager, a Hadoop NodeManager, provisioning software, and client
+applications. Between all of these processes, it is not uncommon to
+over-subscribe the available RAM in a VM. We recommend setting up VMs without
+swap enabled, so rather than performance grinding to a halt when physical
+memory is exhausted the kernel will randomly* select processes to kill in order
+to free up memory.
+
+Calculating the maximum possible memory usage is essential in creating a stable
+Accumulo VM setup. Safely engineering memory allocation for stability is a
+matter of then bringing the calculated maximum memory usage under the physical
+memory by a healthy margin. The margin is to account for operating system-level
+operations, such as managing process, maintaining virtual memory pages, and
+file system caching. When the java out-of-memory killer finds your process, you
+will probably only see evidence of that in /var/log/messages. Out-of-memory
+process kills do not show up in Accumulo or Hadoop logs.
+
+To calculate the max memory usage of all java virtual machine (JVM) processes
+add the maximum heap size (often limited by a -Xmx... argument, such as in
+accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage
+includes the following:
+
+* "Permanent Space", where the JVM stores Classes, Methods, and other code 
elements. This can be limited by a JVM flag such as `-XX:MaxPermSize:100m`, and 
is typically tens of megabytes.
+* Code generation space, where the JVM stores just-in-time compiled code. This 
is typically small enough to ignore
+* Socket buffers, where the JVM stores send and receive buffers for each 
socket.
+* Thread stacks, where the JVM allocates memory to manage each thread.
+* Direct memory space and JNI code, where applications can allocate memory 
outside of the JVM-managed space. For Accumulo, this includes the native 
in-memory maps that are allocated with the memory.maps.max parameter in 
accumulo-site.xml.
+* Garbage collection space, where the JVM stores information used for garbage 
collection.
+
+You can assume that each Hadoop and Accumulo process will use ~100-150MB for
+Off-heap memory, plus the in-memory map of the Accumulo TServer process. A
+simple calculation for physical memory requirements follows:
+
+```
+  Physical memory needed
+    = (per-process off-heap memory) + (heap memory) + (other processes) + 
(margin)
+    = (number of java processes * 150M + native map) + (sum of -Xmx settings 
for java process) + (total applications memory, provisioning memory, etc.) + 
(1G)
+    = (11*150M +500M) + (1G +1G +1G +256M +1G +256M +512M +512M +512M +512M 
+512M) + (2G) + (1G)
+    = (2150M) + (7G) + (2G) + (1G)
+    = ~12GB
+```
+
+These calculations can add up quickly with the large number of processes,
+especially in constrained VM environments. To reduce the physical memory
+requirements, it is a good idea to reduce maximum heap limits and turn off
+unnecessary processes. If you're not using YARN in your application, you can
+turn off the ResourceManager and NodeManager. If you're not expecting to
+re-provision the cluster frequently you can turn off or reduce provisioning
+processes such as Salt Stack minions and masters.
+
+#### Disk Space
+
+Disk space is primarily used for two operations: storing data and storing logs.
+While Accumulo generally stores all of its key/value data in HDFS, Accumulo,
+Hadoop, and Zookeeper all store a significant amount of logs in a directory on
+a local file system. Care should be taken to make sure that (a) limitations to
+the amount of logs generated are in place, and (b) enough space is available to
+host the generated logs on the partitions that they are assigned. When space is
+not available to log, processes will hang. This can cause interruptions in
+availability of Accumulo, as well as cascade into failures of various
+processes.
+
+Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of
+them has a way of limiting the logs and directing them to a particular
+directory. Logs are generated independently for each process, so when
+considering the total space you need to add up the maximum logs generated by
+each process. Typically, a rolling log setup in which each process can generate
+something like 10 100MB files is instituted, resulting in a maximum file system
+usage of 1GB per process. Default setups for Hadoop and Zookeeper are often
+unbounded, so it is important to set these limits in the logging configuration
+files for each subsystem. Consult the user manual for each system for
+instructions on how to limit generated logs.
+
+#### Zookeeper Interaction
+
+Accumulo is designed to scale up to thousands of nodes. At that scale,
+intermittent interruptions in network service and other rare failures of
+compute nodes become more common. To limit the impact of node failures on
+overall service availability, Accumulo uses a heartbeat monitoring system that
+leverages Zookeeper's ephemeral locks. There are several conditions that can
+occur that cause Accumulo process to lose their Zookeeper locks, some of which
+are true interruptions to availability and some of which are false positives.
+Several of these conditions become more common in VM environments, where they
+can be exacerbated by resource constraints and clock drift.
+
+#### Tested Versions
+
+Each release of Accumulo is built with a specific version of Apache
+Hadoop, Apache ZooKeeper and Apache Thrift.  We expect Accumulo to
+work with versions that are API compatible with those versions.
+However this compatibility is not guaranteed because Hadoop, ZooKeeper
+and Thift may not provide guarantees between their own versions. We
+have also found that certain versions of Accumulo and Hadoop included
+bugs that greatly affected overall stability.  Thrift is particularly
+prone to compatibility changes between versions and you must use the
+same version your Accumulo is built with.
+
+Please check the release notes for your Accumulo version or use the
+mailing lists at https://accumulo.apache.org for more info.
+
+[quick]: {{ page.docs_baseurl }}/getting-started/quick-install
+[monitor]: {{page.docs_baseurl}}/administration/monitoring-metrics#monitor
+[config-mgmt]: {{page.docs_baseurl}}/administration/configuration-management

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/3c554918/_docs-unreleased/administration/installation.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/administration/installation.md 
b/_docs-unreleased/administration/installation.md
deleted file mode 100644
index 32724ef..0000000
--- a/_docs-unreleased/administration/installation.md
+++ /dev/null
@@ -1,719 +0,0 @@
----
-title: Installation
-category: administration
-order: 1
----
-
-## Hardware
-
-Because we are running essentially two or three systems simultaneously layered
-across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to
-consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process 
can have
-at least one core and 2 - 4 GB each.
-
-One core running HDFS can typically keep 2 to 4 disks busy, so each machine may
-typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB 
disks.
-
-It is possible to do with less than this, such as with 1u servers with 2 cores 
and 4GB
-each, but in this case it is recommended to only run up to two processes per
-machine -- i.e. DataNode and TabletServer or DataNode and MapReduce worker but
-not all three. The constraint here is having enough available heap space for 
all the
-processes on a machine.
-
-## Network
-
-Accumulo communicates via remote procedure calls over TCP/IP for both passing
-data and control messages. In addition, Accumulo uses HDFS clients to
-communicate with HDFS. To achieve good ingest and query performance, sufficient
-network bandwidth must be available between any two machines.
-
-In addition to needing access to ports associated with HDFS and ZooKeeper, 
Accumulo will
-use the following default ports. Please make sure that they are open, or change
-their value in accumulo-site.xml.
-
-|Port | Description | Property Name
-|-----|-------------|--------------
-|4445 | Shutdown Port (Accumulo MiniCluster) | n/a
-|4560 | Accumulo monitor (for centralized log display) | monitor.port.log4j
-|9995 | Accumulo HTTP monitor | monitor.port.client
-|9997 | Tablet Server | tserver.port.client
-|9998 | Accumulo GC | gc.port.client
-|9999 | Master Server | master.port.client
-|12234 | Accumulo Tracer | trace.port.client
-|42424 | Accumulo Proxy Server | n/a
-|10001 | Master Replication service | master.replication.coordinator.port
-|10002 | TabletServer Replication service | replication.receipt.service.port
-
-In addition, the user can provide `0` and an ephemeral port will be chosen 
instead. This
-ephemeral port is likely to be unique and not already bound. Thus, configuring 
ports to
-use `0` instead of an explicit value, should, in most cases, work around any 
issues of
-running multiple distinct Accumulo instances (or any other process which tries 
to use the
-same default ports) on the same hardware. Finally, the *.port.client 
properties will work
-with the port range syntax (M-N) allowing the user to specify a range of ports 
for the
-service to attempt to bind. The ports in the range will be tried in a 1-up 
manner starting
-at the low end of the range to, and including, the high end of the range.
-
-## Download Tarball
-
-Download a binary distribution of Accumulo and install it to a directory on a 
disk with
-sufficient space:
-
-    cd <install directory>
-    tar xzf accumulo-X.Y.Z-bin.tar.gz   # Replace 'X.Y.Z' with your Accumulo 
version
-    cd accumulo-X.Y.Z
-
-Repeat this step on each machine in your cluster. Typically, the same 
`<install directory>`
-is chosen for all machines in the cluster.
-
-There are four scripts in the `bin/` directory that are used to manage 
Accumulo:
-
-1. `accumulo` - Runs Accumulo command-line tools and starts Accumulo processes
-2. `accumulo-service` - Runs Accumulo processes as services
-3. `accumulo-cluster` - Manages Accumulo cluster on a single node or several 
nodes
-4. `accumulo-util` - Accumulo utilities for creating configuration, native 
libraries, etc.
-
-These scripts will be used in the remaining instructions to configure and run 
Accumulo.
-
-## Dependencies
-
-Accumulo requires HDFS and ZooKeeper to be configured and running
-before starting. Password-less SSH should be configured between at least the
-Accumulo master and TabletServer machines. It is also a good idea to run 
Network
-Time Protocol (NTP) within the cluster to ensure nodes' clocks don't get too 
out of
-sync, which can cause problems with automatically timestamped data.
-
-## Configuration
-
-The Accumulo tarball contains a `conf/` directory where Accumulo looks for 
configuration. If you
-installed Accumulo using downstream packaging, the `conf/` could be something 
else like
-`/etc/accumulo/`.
-
-Before starting Accumulo, the configuration files `accumulo-env.sh` and 
`accumulo-site.xml` must
-exist in `conf/` and be properly configured. If you are using 
`accumulo-cluster` to launch
-a cluster, the `conf/` directory must also contain hosts file for Accumulo 
services (i.e `gc`,
-`masters`, `monitor`, `tservers`, `tracers`). You can either create these 
files manually or run
-`accumulo-cluster create-config`.
-
-Logging is configured in `accumulo-env.sh` to use three log4j configuration 
files in `conf/`. The
-file used depends on the Accumulo command or service being run. Logging for 
most Accumulo services
-(i.e Master, TabletServer, Garbage Collector) is configured by 
`log4j-service.properties` except for
-the Monitor which is configured by `log4j-monitor.properties`. All Accumulo 
commands (i.e `init`,
-`shell`, etc) are configured by `log4j.properties`.
-
-### Configure accumulo-env.sh
-
-Accumulo needs to know where to find the software it depends on. Edit 
accumulo-env.sh
-and specify the following:
-
-1. Enter the location of Hadoop for `$HADOOP_PREFIX`
-2. Enter the location of ZooKeeper for `$ZOOKEEPER_HOME`
-3. Optionally, choose a different location for Accumulo logs using 
`$ACCUMULO_LOG_DIR`
-
-Accumulo uses `HADOOP_PREFIX` and `ZOOKEEPER_HOME` to locate Hadoop and 
Zookeeper jars
-and add them the `CLASSPATH` variable. If you are running a vendor-specific 
release of Hadoop
-or Zookeeper, you may need to change how your `CLASSPATH` is built in 
`accumulo-env.sh`. If
-Accumulo has problems later on finding jars, run `accumulo classpath -d` to 
debug and print
-Accumulo's classpath.
-
-You may want to change the default memory settings for Accumulo's TabletServer 
which are
-by set in the `JAVA_OPTS` settings for 'tservers' in `accumulo-env.sh`. Note 
the
-syntax is that of the Java JVM command line options. This value should be less 
than the
-physical memory of the machines running TabletServers.
-
-There are similar options for the master's memory usage and the garbage 
collector
-process. Reduce these if they exceed the physical RAM of your hardware and
-increase them, within the bounds of the physical RAM, if a process fails 
because of
-insufficient memory.
-
-Note that you will be specifying the Java heap space in accumulo-env.sh. You 
should
-make sure that the total heap space used for the Accumulo tserver and the 
Hadoop
-DataNode and TaskTracker is less than the available memory on each worker node 
in
-the cluster. On large clusters, it is recommended that the Accumulo master, 
Hadoop
-NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate
-machines to allow them to use more heap space. If you are running these on the
-same machine on a small cluster, likewise make sure their heap space settings 
fit
-within the available memory.
-
-### Native Map
-
-The tablet server uses a data structure called a MemTable to store sorted 
key/value
-pairs in memory when they are first received from the client. When a minor 
compaction
-occurs, this data structure is written to HDFS. The MemTable will default to 
using
-memory in the JVM but a JNI version, called the native map, can be used to 
significantly
-speed up performance by utilizing the memory space of the native operating 
system. The
-native map also avoids the performance implications brought on by garbage 
collection
-in the JVM by causing it to pause much less frequently.
-
-#### Building
-
-32-bit and 64-bit Linux and Mac OS X versions of the native map can be built 
by executing
-`accumulo-util build-native`. If your system's default compiler options are 
insufficient,
-you can add additional compiler options to the command line, such as options 
for the
-architecture. These will be passed to the Makefile in the environment variable 
`USERFLAGS`.
-
-Examples:
-
-    accumulo-util build-native
-    accumulo-util build-native -m32
-
-After building the native map from the source, you will find the artifact in
-`lib/native`. Upon starting up, the tablet server will look
-in this directory for the map library. If the file is renamed or moved from its
-target directory, the tablet server may not be able to find it. The system can
-also locate the native maps shared library by setting `LD_LIBRARY_PATH`
-(or `DYLD_LIBRARY_PATH` on Mac OS X) in `accumulo-env.sh`.
-
-#### Native Maps Configuration
-
-As mentioned, Accumulo will use the native libraries if they are found in the 
expected
-location and `tserver.memory.maps.native.enabled` is set to `true` (which is 
the default).
-Using the native maps over JVM Maps nets a noticeable improvement in ingest 
rates; however,
-certain configuration variables are important to modify when increasing the 
size of the
-native map.
-
-To adjust the size of the native map, increase the value of 
`tserver.memory.maps.max`.
-By default, the maximum size of the native map is 1GB. When increasing this 
value, it is
-also important to adjust the values of `table.compaction.minor.logs.threshold` 
and
-`tserver.walog.max.size`. `table.compaction.minor.logs.threshold` is the 
maximum
-number of write-ahead log files that a tablet can reference before they will 
be automatically
-minor compacted. `tserver.walog.max.size` is the maximum size of a write-ahead 
log.
-
-The maximum size of the native maps for a server should be less than the 
product
-of the write-ahead log maximum size and minor compaction threshold for log 
files:
-
-`$table.compaction.minor.logs.threshold * $tserver.walog.max.size >= 
$tserver.memory.maps.max`
-
-This formula ensures that minor compactions won't be automatically triggered 
before the native
-maps can be completely saturated.
-
-Subsequently, when increasing the size of the write-ahead logs, it can also be 
important
-to increase the HDFS block size that Accumulo uses when creating the files for 
the write-ahead log.
-This is controlled via `tserver.wal.blocksize`. A basic recommendation is that 
when
-`tserver.walog.max.size` is larger than 2GB in size, set 
`tserver.wal.blocksize` to 2GB.
-Increasing the block size to a value larger than 2GB can result in decreased 
write
-performance to the write-ahead log file which will slow ingest.
-
-### Cluster Specification
-
-If you are using `accumulo-cluster` to start a cluster, configure the 
following on the
-machine that will serve as the Accumulo master:
-
-1. Write the IP address or domain name of the Accumulo Master to the 
`conf/masters` file.
-2. Write the IP addresses or domain name of the machines that will be 
TabletServers in `conf/tservers`, one per line.
-
-Note that if using domain names rather than IP addresses, DNS must be 
configured
-properly for all machines participating in the cluster. DNS can be a confusing 
source
-of errors.
-
-### Configure accumulo-site.xml
-
-Specify appropriate values for the following settings in `accumulo-site.xml`:
-
-```xml
-<property>
-    <name>instance.zookeeper.host</name>
-    <value>zooserver-one:2181,zooserver-two:2181</value>
-    <description>list of zookeeper servers</description>
-</property>
-```
-
-This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate
-settings between processes and helps finalize TabletServer failure.
-
-```xml
-<property>
-    <name>instance.secret</name>
-    <value>DEFAULT</value>
-</property>
-```
-
-The instance needs a secret to enable secure communication between servers. 
Configure your
-secret and make sure that the `accumulo-site.xml` file is not readable to 
other users.
-For alternatives to storing the `instance.secret` in plaintext, please read the
-`Sensitive Configuration Values` section.
-
-Some settings can be modified via the Accumulo shell and take effect 
immediately, but
-some settings require a process restart to take effect. See the [configuration 
management][config-mgmt]
-documentation for details.
-
-### Hostnames in configuration files
-
-Accumulo has a number of configuration files which can contain references to 
other hosts in your
-network. All of the "host" configuration files for Accumulo (`gc`, `masters`, 
`tservers`, `monitor`,
-`tracers`) as well as `instance.volumes` in accumulo-site.xml must contain 
some host reference.
-
-While IP address, short hostnames, or fully qualified domain names (FQDN) are 
all technically valid, it
-is good practice to always use FQDNs for both Accumulo and other processes in 
your Hadoop cluster.
-Failing to consistently use FQDNs can have unexpected consequences in how 
Accumulo uses the FileSystem.
-
-A common way for this problem can be observed is via applications that use 
Bulk Ingest. The Accumulo
-Master coordinates moving the input files to Bulk Ingest to an 
Accumulo-managed directory. However,
-Accumulo cannot safely move files across different Hadoop FileSystems. This is 
problematic because
-Accumulo also cannot make reliable assertions across what is the same 
FileSystem which is specified
-with different names. Naively, while 127.0.0.1:8020 might be a valid 
identifier for an HDFS instance,
-Accumulo identifies `localhost:8020` as a different HDFS instance than 
`127.0.0.1:8020`.
-
-### Deploy Configuration
-
-Copy accumulo-env.sh and accumulo-site.xml from the `conf/` directory on the 
master to all Accumulo
-tablet servers.  The "host" configuration files files `accumulo-cluster` only 
need to be on servers
-where that command is run.
-
-### Sensitive Configuration Values
-
-Accumulo has a number of properties that can be specified via the 
accumulo-site.xml
-file which are sensitive in nature, instance.secret and 
trace.token.property.password
-are two common examples. Both of these properties, if compromised, have the 
ability
-to result in data being leaked to users who should not have access to that 
data.
-
-In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as 
a common
-implementation to abstract away the storage and retrieval of passwords from 
plaintext
-storage in configuration files. Any Property marked with the `Sensitive` 
annotation
-is a candidate for use with these CredentialProviders. For version of Hadoop 
which lack
-these classes, the feature will just be unavailable for use.
-
-A comma separated list of CredentialProviders can be configured using the 
Accumulo Property
-`general.security.credential.provider.paths`. Each configured URL will be 
consulted
-when the Configuration object for accumulo-site.xml is accessed.
-
-### Using a JavaKeyStoreCredentialProvider for storage
-
-One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore 
CredentialProvider.
-Each entry in the KeyStore is the Accumulo Property key name. For example, to 
store the
-`instance.secret`, the following command can be used:
-
-    hadoop credential create instance.secret --provider 
jceks://file/etc/accumulo/conf/accumulo.jceks
-
-The command will then prompt you to enter the secret to use and create a 
keystore in: 
-
-    /path/to/accumulo/conf/accumulo.jceks
-
-Then, accumulo-site.xml must be configured to use this KeyStore as a 
CredentialProvider:
-
-```xml
-<property>
-    <name>general.security.credential.provider.paths</name>
-    <value>jceks://file/path/to/accumulo/conf/accumulo.jceks</value>
-</property>
-```
-
-This configuration will then transparently extract the `instance.secret` from
-the configured KeyStore and alleviates a human readable storage of the 
sensitive
-property.
-
-A KeyStore can also be stored in HDFS, which will make the KeyStore readily 
available to
-all Accumulo servers. If the local filesystem is used, be aware that each 
Accumulo server
-will expect the KeyStore in the same location.
-
-### Client Configuration
-
-In version 1.6.0, Accumulo included a new type of configuration file known as 
a client
-configuration file. One problem with the traditional "site.xml" file that is 
prevalent
-through Hadoop is that it is a single file used by both clients and servers. 
This makes
-it very difficult to protect secrets that are only meant for the server 
processes while
-allowing the clients to connect to the servers.
-
-The client configuration file is a subset of the information stored in 
accumulo-site.xml
-meant only for consumption by clients of Accumulo. By default, Accumulo checks 
a number
-of locations for a client configuration by default:
-
-* `/path/to/accumulo/conf/client.conf`
-* `/etc/accumulo/client.conf`
-* `/etc/accumulo/conf/client.conf`
-* `~/.accumulo/config`
-
-These files are [Java Properties 
files](https://en.wikipedia.org/wiki/.properties). These files
-can currently contain information about ZooKeeper servers, RPC properties 
(such as SSL or SASL
-connectors), distributed tracing properties. Valid properties are defined by 
the 
[ClientProperty](https://github.com/apache/accumulo/blob/f1d0ec93d9f13ff84844b5ac81e4a7b383ced467/core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java#L54)
-enum contained in the client API.
-
-#### Custom Table Tags
-
-Accumulo has the ability for users to add custom tags to tables.  This allows
-applications to set application-level metadata about a table.  These tags can 
be
-anything from a table description, administrator notes, date created, etc.
-This is done by naming and setting a property with a prefix `table.custom.*`.
-
-Currently, table properties are stored in ZooKeeper. This means that the number
-and size of custom properties should be restricted on the order of 10's of 
properties
-at most without any properties exceeding 1MB in size. ZooKeeper's performance 
can be
-very sensitive to an excessive number of nodes and the sizes of the nodes. 
Applications
-which leverage the user of custom properties should take these warnings into
-consideration. There is no enforcement of these warnings via the API.
-
-#### Configuring the ClassLoader
-
-Accumulo builds its Java classpath in `accumulo-env.sh`.  After an Accumulo 
application has started, it will load classes from the locations
-specified in the deprecated `general.classpaths` property. Additionally, 
Accumulo will load classes from the locations specified in the
-`general.dynamic.classpaths` property and will monitor and reload them if they 
change. The reloading  feature is useful during the development
-and testing of iterators as new or modified iterator classes can be deployed 
to Accumulo without having to restart the database.
-/
-Accumulo also has an alternate configuration for the classloader which will 
allow it to load classes from remote locations. This mechanism
-uses Apache Commons VFS which enables locations such as http and hdfs to be 
used. This alternate configuration also uses the
-`general.classpaths` property in the same manner described above. It differs 
in that you need to configure the
-`general.vfs.classpaths` property instead of the `general.dynamic.classpath` 
property. As in the default configuration, this alternate
-configuration will also monitor the vfs locations for changes and reload if 
necessary.
-
-The Accumulo classpath can be viewed in human readable format by running 
`accumulo classpath -d`.
-
-##### ClassLoader Contexts
-
-With the addition of the VFS based classloader, we introduced the notion of 
classloader contexts. A context is identified
-by a name and references a set of locations from which to load classes and can 
be specified in the accumulo-site.xml file or added
-using the `config` command in the shell. Below is an example for specify the 
app1 context in the accumulo-site.xml file:
-
-```xml
-<property>
-  <name>general.vfs.context.classpath.app1</name>
-  
<value>hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar</value>
-  <description>Application A classpath, loads jars from HDFS and local file 
system</description>
-</property>
-```
-
-The default behavior follows the Java ClassLoader contract in that classes, if 
they exists, are loaded from the parent classloader first.
-You can override this behavior by delegating to the parent classloader after 
looking in this classloader first. An example of this
-configuration is:
-
-```xml
-<property>
-  <name>general.vfs.context.classpath.app1.delegation=post</name>
-  
<value>hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar</value>
-  <description>Application A classpath, loads jars from HDFS and local file 
system</description>
-</property>
-```
-
-To use contexts in your application you can set the `table.classpath.context` 
on your tables or use the `setClassLoaderContext()` method on Scanner
-and BatchScanner passing in the name of the context, app1 in the example 
above. Setting the property on the table allows your minc, majc, and scan 
-iterators to load classes from the locations defined by the context. Passing 
the context name to the scanners allows you to override the table setting
-to load only scan time iterators from a different location. 
-
-## Initialization
-
-Accumulo must be initialized to create the structures it uses internally to 
locate
-data across the cluster. HDFS is required to be configured and running before
-Accumulo can be initialized.
-
-Once HDFS is started, initialization can be performed by executing
-`accumulo init` . This script will prompt for a name
-for this instance of Accumulo. The instance name is used to identify a set of 
tables
-and instance-specific settings. The script will then write some information 
into
-HDFS so Accumulo can start properly.
-
-The initialization script will prompt you to set a root password. Once 
Accumulo is
-initialized it can be started.
-
-## Running
-
-### Starting Accumulo
-
-Make sure Hadoop is configured on all of the machines in the cluster, including
-access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running.
-Make sure ZooKeeper is configured and running on at least one machine in the
-cluster.
-Start Accumulo using `accumulo-cluster start`.
-
-To verify that Accumulo is running, check the [Accumulo monitor][monitor].
-In addition, the Shell can provide some information about the status of tables 
via reading the metadata tables.
-
-### Stopping Accumulo
-
-To shutdown cleanly, run `accumulo-cluster stop` and the master will 
orchestrate the
-shutdown of all the tablet servers. Shutdown waits for all minor compactions 
to finish, so it may
-take some time for particular configurations.
-
-### Adding a Tablet Server
-
-Update your `conf/tservers` file to account for the addition.
-
-Next, ssh to each of the hosts you want to add and run:
-
-    accumulo-service tserver start
-
-Make sure the host in question has the new configuration, or else the tablet
-server won't start; at a minimum this needs to be on the host(s) being added,
-but in practice it's good to ensure consistent configuration across all nodes.
-
-### Decomissioning a Tablet Server
-
-If you need to take a node out of operation, you can trigger a graceful 
shutdown of a tablet
-server. Accumulo will automatically rebalance the tablets across the available 
tablet servers.
-
-    accumulo admin stop <host(s)> {<host> ...}
-
-Alternatively, you can ssh to each of the hosts you want to remove and run:
-
-    accumulo-service tserver stop
-
-Be sure to update your `conf/tservers` file to
-account for the removal of these hosts. Bear in mind that the monitor will not 
re-read the
-tservers file automatically, so it will report the decommissioned servers as 
down; it's
-recommended that you restart the monitor so that the node list is up to date.
-
-The steps described to decommission a node can also be used (without removal 
of the host
-from the `conf/tservers` file) to gracefully stop a node. This will
-ensure that the tabletserver is cleanly stopped and recovery will not need to 
be performed
-when the tablets are re-hosted.
-
-### Restarting process on a node
-
-Occasionally, it might be necessary to restart the processes on a specific 
node. In addition
-to the `accumulo-cluster` script, Accumulo has a `accumulo-service` script that
-can be use to start/stop processes on a node.
-
-#### A note on rolling restarts
-
-For sufficiently large Accumulo clusters, restarting multiple TabletServers 
within a short window can place significant 
-load on the Master server.  If slightly lower availability is acceptable, this 
load can be reduced by globally setting 
-`table.suspend.duration` to a positive value.  
-
-With `table.suspend.duration` set to, say, `5m`, Accumulo will wait 
-for 5 minutes for any dead TabletServer to return before reassigning that 
TabletServer's responsibilities to other TabletServers.
-If the TabletServer returns to the cluster before the specified timeout has 
elapsed, Accumulo will assign the TabletServer 
-its original responsibilities.
-
-It is important not to choose too large a value for `table.suspend.duration`, 
as during this time, all scans against the 
-data that TabletServer had hosted will block (or time out).
-
-### Running multiple TabletServers on a single node
-
-With very powerful nodes, it may be beneficial to run more than one 
TabletServer on a given
-node. This decision should be made carefully and with much deliberation as 
Accumulo is designed
-to be able to scale to using 10's of GB of RAM and 10's of CPU cores.
-
-Accumulo TabletServers bind certain ports on the host to accommodate remote 
procedure calls to/from
-other nodes. Running more than one TabletServer on a host requires that you 
set the environment variable
-`ACCUMULO_SERVICE_INSTANCE` to an instance number (i.e 1, 2) for each instance 
that is started. Also, set
-these properties in `accumulo-site.xml`:
-
-```xml
-  <property>
-    <name>tserver.port.search</name>
-    <value>true</value>
-  </property>
-  <property>
-    <name>replication.receipt.service.port</name>
-    <value>0</value>
-  </property>
-```
-
-## Logging
-
-Accumulo processes each write to a set of log files. By default, these logs 
are found at directory
-set by `ACCUMULO_LOG_DIR` in `accumulo-env.sh`.
-
-## Recovery
-
-In the event of TabletServer failure or error on shutting Accumulo down, some
-mutations may not have been minor compacted to HDFS properly. In this case,
-Accumulo will automatically reapply such mutations from the write-ahead log
-either when the tablets from the failed server are reassigned by the Master 
(in the
-case of a single TabletServer failure) or the next time Accumulo starts (in 
the event of
-failure during shutdown).
-
-Recovery is performed by asking a tablet server to sort the logs so that 
tablets can easily find their missing
-updates. The sort status of each file is displayed on
-Accumulo monitor status page. Once the recovery is complete any
-tablets involved should return to an `online` state. Until then those tablets 
will be
-unavailable to clients.
-
-The Accumulo client library is configured to retry failed mutations and in many
-cases clients will be able to continue processing after the recovery process 
without
-throwing an exception.
-
-## Migrating Accumulo from non-HA Namenode to HA Namenode
-
-The following steps will allow a non-HA instance to be migrated to an HA 
instance. Consider an HDFS URL
-`hdfs://namenode.example.com:8020` which is going to be moved to 
`hdfs://nameservice1`.
-
-Before moving HDFS over to the HA namenode, use `accumulo admin volumes` to 
confirm
-that the only volume displayed is the volume from the current namenode's HDFS 
URL.
-
-    Listing volumes referenced in zookeeper
-            Volume : hdfs://namenode.example.com:8020/accumulo
-
-    Listing volumes referenced in accumulo.root tablets section
-            Volume : hdfs://namenode.example.com:8020/accumulo
-    Listing volumes referenced in accumulo.root deletes section (volume 
replacement occurrs at deletion time)
-
-    Listing volumes referenced in accumulo.metadata tablets section
-            Volume : hdfs://namenode.example.com:8020/accumulo
-
-    Listing volumes referenced in accumulo.metadata deletes section (volume 
replacement occurrs at deletion time)
-
-After verifying the current volume is correct, shut down the cluster and 
transition HDFS to the HA nameservice.
-
-Edit `accumulo-site.xml` to notify accumulo that a volume is being replaced. 
First,
-add the new nameservice volume to the `instance.volumes` property. Next, add 
the
-`instance.volumes.replacements` property in the form of `old new`. It's 
important to not include
-the volume that's being replaced in `instance.volumes`, otherwise it's 
possible accumulo could continue
-to write to the volume.
-
-```xml
-<!-- instance.dfs.uri and instance.dfs.dir should not be set-->
-<property>
-  <name>instance.volumes</name>
-  <value>hdfs://nameservice1/accumulo</value>
-</property>
-<property>
-  <name>instance.volumes.replacements</name>
-  <value>hdfs://namenode.example.com:8020/accumulo 
hdfs://nameservice1/accumulo</value>
-</property>
-```
-
-Run `accumulo init --add-volumes` and start up the accumulo cluster. Verify 
that the
-new nameservice volume shows up with `accumulo admin volumes`.
-
-    Listing volumes referenced in zookeeper
-            Volume : hdfs://namenode.example.com:8020/accumulo
-            Volume : hdfs://nameservice1/accumulo
-
-    Listing volumes referenced in accumulo.root tablets section
-            Volume : hdfs://namenode.example.com:8020/accumulo
-            Volume : hdfs://nameservice1/accumulo
-    Listing volumes referenced in accumulo.root deletes section (volume 
replacement occurrs at deletion time)
-
-    Listing volumes referenced in accumulo.metadata tablets section
-            Volume : hdfs://namenode.example.com:8020/accumulo
-            Volume : hdfs://nameservice1/accumulo
-    Listing volumes referenced in accumulo.metadata deletes section (volume 
replacement occurrs at deletion time)
-
-Some erroneous GarbageCollector messages may still be seen for a small period 
while data is transitioning to
-the new volumes. This is expected and can usually be ignored.
-
-## Achieving Stability in a VM Environment
-
-For testing, demonstration, and even operation uses, Accumulo is often
-installed and run in a virtual machine (VM) environment. The majority of
-long-term operational uses of Accumulo are on bare-metal cluster. However, the
-core design of Accumulo and its dependencies do not preclude running stably for
-long periods within a VM. Many of Accumuloâs operational robustness features 
to
-handle failures like periodic network partitioning in a large cluster carry
-over well to VM environments. This guide covers general recommendations for
-maximizing stability in a VM environment, including some of the common failure
-modes that are more common when running in VMs.
-
-### Known failure modes: Setup and Troubleshooting
-
-In addition to the general failure modes of running Accumulo, VMs can 
introduce a
-couple of environmental challenges that can affect process stability. Clock
-drift is something that is more common in VMs, especially when VMs are
-suspended and resumed. Clock drift can cause Accumulo servers to assume that
-they have lost connectivity to the other Accumulo processes and/or lose their
-locks in Zookeeper. VM environments also frequently have constrained resources,
-such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally
-deals well with constrained resources from a stability perspective (optimizing
-performance will require additional tuning, which is not covered in this
-section), however there are some limits.
-
-#### Physical Memory
-
-One of those limits has to do with the Linux out of memory killer. A common
-failure mode in VM environments (and in some bare metal installations) is when
-the Linux out of memory killer decides to kill processes in order to avoid a
-kernel panic when provisioning a memory page. This often happens in VMs due to
-the large number of processes that must run in a small memory footprint. In
-addition to the Linux core processes, a single-node Accumulo setup requires a
-Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper
-server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer.
-Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop
-ResourceManager, a Hadoop NodeManager, provisioning software, and client
-applications. Between all of these processes, it is not uncommon to
-over-subscribe the available RAM in a VM. We recommend setting up VMs without
-swap enabled, so rather than performance grinding to a halt when physical
-memory is exhausted the kernel will randomly* select processes to kill in order
-to free up memory.
-
-Calculating the maximum possible memory usage is essential in creating a stable
-Accumulo VM setup. Safely engineering memory allocation for stability is a
-matter of then bringing the calculated maximum memory usage under the physical
-memory by a healthy margin. The margin is to account for operating system-level
-operations, such as managing process, maintaining virtual memory pages, and
-file system caching. When the java out-of-memory killer finds your process, you
-will probably only see evidence of that in /var/log/messages. Out-of-memory
-process kills do not show up in Accumulo or Hadoop logs.
-
-To calculate the max memory usage of all java virtual machine (JVM) processes
-add the maximum heap size (often limited by a -Xmx... argument, such as in
-accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage
-includes the following:
-
-* "Permanent Space", where the JVM stores Classes, Methods, and other code 
elements. This can be limited by a JVM flag such as `-XX:MaxPermSize:100m`, and 
is typically tens of megabytes.
-* Code generation space, where the JVM stores just-in-time compiled code. This 
is typically small enough to ignore
-* Socket buffers, where the JVM stores send and receive buffers for each 
socket.
-* Thread stacks, where the JVM allocates memory to manage each thread.
-* Direct memory space and JNI code, where applications can allocate memory 
outside of the JVM-managed space. For Accumulo, this includes the native 
in-memory maps that are allocated with the memory.maps.max parameter in 
accumulo-site.xml.
-* Garbage collection space, where the JVM stores information used for garbage 
collection.
-
-You can assume that each Hadoop and Accumulo process will use ~100-150MB for
-Off-heap memory, plus the in-memory map of the Accumulo TServer process. A
-simple calculation for physical memory requirements follows:
-
-```
-  Physical memory needed
-    = (per-process off-heap memory) + (heap memory) + (other processes) + 
(margin)
-    = (number of java processes * 150M + native map) + (sum of -Xmx settings 
for java process) + (total applications memory, provisioning memory, etc.) + 
(1G)
-    = (11*150M +500M) + (1G +1G +1G +256M +1G +256M +512M +512M +512M +512M 
+512M) + (2G) + (1G)
-    = (2150M) + (7G) + (2G) + (1G)
-    = ~12GB
-```
-
-These calculations can add up quickly with the large number of processes,
-especially in constrained VM environments. To reduce the physical memory
-requirements, it is a good idea to reduce maximum heap limits and turn off
-unnecessary processes. If you're not using YARN in your application, you can
-turn off the ResourceManager and NodeManager. If you're not expecting to
-re-provision the cluster frequently you can turn off or reduce provisioning
-processes such as Salt Stack minions and masters.
-
-#### Disk Space
-
-Disk space is primarily used for two operations: storing data and storing logs.
-While Accumulo generally stores all of its key/value data in HDFS, Accumulo,
-Hadoop, and Zookeeper all store a significant amount of logs in a directory on
-a local file system. Care should be taken to make sure that (a) limitations to
-the amount of logs generated are in place, and (b) enough space is available to
-host the generated logs on the partitions that they are assigned. When space is
-not available to log, processes will hang. This can cause interruptions in
-availability of Accumulo, as well as cascade into failures of various
-processes.
-
-Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of
-them has a way of limiting the logs and directing them to a particular
-directory. Logs are generated independently for each process, so when
-considering the total space you need to add up the maximum logs generated by
-each process. Typically, a rolling log setup in which each process can generate
-something like 10 100MB files is instituted, resulting in a maximum file system
-usage of 1GB per process. Default setups for Hadoop and Zookeeper are often
-unbounded, so it is important to set these limits in the logging configuration
-files for each subsystem. Consult the user manual for each system for
-instructions on how to limit generated logs.
-
-#### Zookeeper Interaction
-
-Accumulo is designed to scale up to thousands of nodes. At that scale,
-intermittent interruptions in network service and other rare failures of
-compute nodes become more common. To limit the impact of node failures on
-overall service availability, Accumulo uses a heartbeat monitoring system that
-leverages Zookeeper's ephemeral locks. There are several conditions that can
-occur that cause Accumulo process to lose their Zookeeper locks, some of which
-are true interruptions to availability and some of which are false positives.
-Several of these conditions become more common in VM environments, where they
-can be exacerbated by resource constraints and clock drift.
-
-#### Tested Versions
-
-Each release of Accumulo is built with a specific version of Apache
-Hadoop, Apache ZooKeeper and Apache Thrift.  We expect Accumulo to
-work with versions that are API compatible with those versions.
-However this compatibility is not guaranteed because Hadoop, ZooKeeper
-and Thift may not provide guarantees between their own versions. We
-have also found that certain versions of Accumulo and Hadoop included
-bugs that greatly affected overall stability.  Thrift is particularly
-prone to compatibility changes between versions and you must use the
-same version your Accumulo is built with.
-
-Please check the release notes for your Accumulo version or use the
-mailing lists at https://accumulo.apache.org for more info.
-
-[monitor]: {{page.docs_baseurl}}/administration/monitoring-metrics#monitor
-[config-mgmt]: {{page.docs_baseurl}}/administration/configuration-management

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/3c554918/_docs-unreleased/getting-started/clients.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/getting-started/clients.md 
b/_docs-unreleased/getting-started/clients.md
index 5dc52d3..ff4fdbd 100644
--- a/_docs-unreleased/getting-started/clients.md
+++ b/_docs-unreleased/getting-started/clients.md
@@ -1,7 +1,7 @@
 ---
 title: Accumulo Clients
 category: getting-started
-order: 2
+order: 3
 ---
 
 ## Running Client Code

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/3c554918/_docs-unreleased/getting-started/quick-install.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/getting-started/quick-install.md 
b/_docs-unreleased/getting-started/quick-install.md
new file mode 100644
index 0000000..e004b9d
--- /dev/null
+++ b/_docs-unreleased/getting-started/quick-install.md
@@ -0,0 +1,186 @@
+---
+title: Quick Installation
+category: getting-started
+order: 2
+---
+
+<!-- IMPORTANT: This file should mirror (with minor differences) INSTALL.md in 
the Accumulo repo -->
+
+This document provides basic instructions for installing Accumulo. For 
detailed instructions,
+see the [in-depth installation guide][in-depth].
+
+Either [download] or [build] a binary distribution of Accumulo from source 
code and
+unpack as follows.
+
+    tar xzf /path/to/accumulo-X.Y.Z-bin.tar.gz
+    cd accumulo-X.Y.Z
+
+There are four scripts in the `bin` directory of the tarball distribution that 
are used
+to manage Accumulo:
+
+1. `accumulo` - Runs Accumulo command-line tools and starts Accumulo processes
+2. `accumulo-service` - Runs Accumulo processes as services
+3. `accumulo-cluster` - Manages Accumulo cluster on a single node or several 
nodes
+4. `accumulo-util` - Accumulo utilities for building native libraries, running 
jars, etc.
+
+These scripts will be used in the remaining instructions to configure and run 
Accumulo.
+For convenience, consider adding `accumulo-X.Y.Z/bin/` to your shell's path.
+
+## Configuring Accumulo
+
+Accumulo requires running [Zookeeper] and [HDFS] instances which should be set 
up
+before configuring Accumulo.
+
+The primary configuration files for Accumulo are `accumulo-env.sh` and 
`accumulo-site.xml`
+which are located in the `conf/` directory.
+
+Follow the steps below to configure `accumulo-site.xml`:
+
+1. Run `accumulo-util build-native` to build native code.  If this command 
fails, disable
+   native maps by setting `tserver.memory.maps.native.enabled` to `false`.
+
+2. Set `instance.volumes` to HDFS location where Accumulo will store data. If 
your namenode
+   is running at 192.168.1.9:8020 and you want to store data in `/accumulo` in 
HDFS, then set
+   `instance.volumes` to `hdfs://192.168.1.9:8020/accumulo`.
+
+3. Set `instance.zookeeper.host` to the location of your Zookeepers
+
+4. (Optional) Change `instance.secret` (which is used by Accumulo processes to 
communicate)
+   from the default. This value should match on all servers.
+
+Follow the steps below to configure `accumulo-env.sh`:
+
+1. Set `HADOOP_PREFIX` and `ZOOKEEPER_HOME` to the location of your Hadoop and 
Zookeeper
+   installations. Accumulo will use these locations to find Hadoop and 
Zookeeper jars and add
+   them to your `CLASSPATH` variable. If you you are running a vendor-specific 
release of
+   Hadoop or Zookeeper, you may need to modify how the `CLASSPATH` variable is 
built in
+   `accumulo-env.sh`. If Accumulo has problems loading classes when you start 
it, run 
+   `accumulo classpath -d` to debug and print Accumulo's classpath.
+
+2. Accumulo tablet servers are configured by default to use 1GB of memory 
(768MB is allocated to
+   JVM and 256MB is allocated for native maps). Native maps are allocated 
memory equal to 33% of
+   the tserver JVM heap. The table below can be used if you would like to 
change tsever memory
+   usage in the `JAVA_OPTS` section of `accumulo-env.sh`:
+
+    | Native? | 512MB             | 1GB               | 2GB                 | 
3GB           |
+    
|---------|-------------------|-------------------|---------------------|---------------|
+    | Yes     | -Xmx384m -Xms384m | -Xmx768m -Xms768m | -Xmx1536m -Xms1536m | 
-Xmx2g -Xms2g |
+    | No      | -Xmx512m -Xms512m | -Xmx1g -Xms1g     | -Xmx2g -Xms2g       | 
-Xmx3g -Xms3g |
+
+3. (Optional) Review the memory settings for the Accumulo master, garbage 
collector, and monitor
+   in the `JAVA_OPTS` section of `accumulo-env.sh`.
+
+## Initialization
+
+Accumulo needs to initialize the locations where it stores data in Zookeeper
+and HDFS.  The following command will do this.
+
+    accumulo init
+
+The initialization command will prompt for the following information.
+
+ * **Instance name** : This is the name of the Accumulo instance and its
+   Accumulo clients need to know it inorder to connect.
+ * **Root password** : Initialization sets up an initial Accumulo root user and
+   prompts for its password.  This information will be needed to later connect
+   to Accumulo.
+
+## Run Accumulo
+
+There are several methods for running Accumulo:
+
+1. Run Accumulo processes using `accumulo` command which runs processes in 
foreground and
+   will not redirect stderr/stdout. Useful for creating init.d scripts that 
run Accumulo.
+
+2. Run Accumulo processes as services using `accumulo-service` which uses 
`accumulo`
+   command but backgrounds processes, redirects stderr/stdout and manages pid 
files.
+   Useful if you are using a cluster management tool (i.e Ansible, Salt, etc).
+
+2. Run an Accumulo cluster on one or more nodes using `accumulo-cluster` (which
+   uses `accumulo-service` to run services). Useful for local development and
+   testing or if you are not using a cluster management tool in production.
+
+Each method above has instructions below.
+
+### Run Accumulo processes
+
+Start Accumulo processes (tserver, master, moniitor, etc) using command below:
+
+    accumulo tserver
+
+The process will run in the foreground. Use ctrl-c to quit.
+
+### Run Accumulo services
+
+Start Accumulo services (tserver, master, monitor, etc) using command below:
+
+    accumulo-service tserver start
+
+### Run an Accumulo cluster
+
+Before using the `accumulo-cluster` script, additional configuration files need
+to be created. Use the command below to create them:
+
+    accumulo-cluster create-config
+
+This creates five files (`masters`, `gc`, `monitor`, `tservers`, & `tracers`)
+in the `conf/` directory that contain the node names where Accumulo services
+are run on your cluster. By default, all files are configured to `localhost`. 
If
+you are running a single-node Accumulo cluster, theses files do not need to be
+changed and the next section should be skipped.
+
+#### Multi-node configuration
+
+If you are running an Accumulo cluster on multiple nodes, the following files
+in `conf/` should be configured with a newline separated list of node names:
+
+ * `masters` : Accumulo primary coordinating process. Must specify one node. 
Can
+               specify a few for fault tolerance.
+ * `gc`      : Accumulo garbage collector. Must specify one node. Can specify a
+               few for fault tolerance.
+ * `monitor` : Node where Accumulo monitoring web server is run.
+ * `tservers`: Accumulo worker processes. List all of the nodes where tablet 
servers
+               should run in this file.
+ * `tracers` : Optional capability. Can specify zero or more nodes. 
+
+The Accumulo, Hadoop, and Zookeeper software should be present at the same
+location on every node. Also the files in the `conf` directory must be copied
+to every node. There are many ways to replicate the software and configuration,
+two possible tools that can help replicate software and/or config are [pdcp]
+and [prsync].
+
+The `accumulo-cluster` script uses ssh to start processes on remote nodes. 
Before
+attempting to start Accumulo, [passwordless ssh][pwl] must be setup on the 
cluster.
+
+#### Start cluster
+
+After configuring and initializing Accumulo, use the following command to start
+the cluster:
+
+    accumulo-cluster start
+
+## First steps
+
+Once you have started Accumulo, use the following command to run the Accumulo 
shell:
+
+    accumulo shell -u root
+
+Use your web browser to connect the Accumulo monitor page on port 9995.
+
+    http://<hostname in conf/monitor>:9995/
+
+## Stopping Accumulo
+
+When finished, use the following commands to stop Accumulo:
+
+* Stop Accumulo service: `accumulo-service tserver stop`
+* Stop Accumulo cluster: `accumulo-cluster stop`
+
+[in-depth]: {{ page.docs_baseurl }}/administration/in-depth-install
+[download]: https://accumulo.apache.org/downloads/
+[build]: https://github.com/apache/accumulo/blob/master/README.md#building
+[Zookeeper]: https://zookeeper.apache.org/
+[HDFS]: https://hadoop.apache.org/
+[pdcp]: https://code.google.com/p/pdsh/
+[prysnc]: https://code.google.com/p/parallel-ssh/
+[pwl]: 
https://www.google.com/search?q=hadoop+passwordless+ssh&ie=utf-8&oe=utf-8

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/3c554918/_docs-unreleased/getting-started/shell.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/getting-started/shell.md 
b/_docs-unreleased/getting-started/shell.md
index 8007a28..316a060 100644
--- a/_docs-unreleased/getting-started/shell.md
+++ b/_docs-unreleased/getting-started/shell.md
@@ -1,7 +1,7 @@
 ---
 title: Accumulo Shell
 category: getting-started
-order: 3
+order: 4
 ---
 
 Accumulo provides a simple shell that can be used to examine the contents and

[2/5] accumulo-website git commit: Updates to installation instructions

Reply via email to