[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

GitBox Fri, 06 Apr 2018 15:14:40 -0700

wohali commented on a change in pull request #268: Rewrite sharding 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/268#discussion_r179887306


 ##########
 File path: src/cluster/sharding.rst
 ##########
 @@ -18,43 +18,170 @@ Sharding
 
 .. _cluster/sharding/scaling-out:
 
-Scaling out
-===========
+Introduction
+------------
 
-Normally you start small and grow over time. In the beginning you might do just
-fine with one node, but as your data and number of clients grows, you need to
-scale out.
+A
+`shard <https://en.wikipedia.org/wiki/Shard_(database_architecture)>`__
+is a horizontal partition of data in a database. Partitioning data into
+shards and distributing copies of each shard (called "replicas") to
+different nodes in a cluster gives the data greater durability against
+node loss. CouchDB clusters automatically shard and distribute data
+among nodes, but modifying cluster membership and customizing shard
+behavior must be done manually.
 
-For simplicity we will start fresh and small.
+Shards and Replicas
+~~~~~~~~~~~~~~~~~~~
 
-Start ``node1`` and add a database to it. To keep it simple we will have 2
-shards and no replicas.
+How many shards and replicas each database has can be set at the global
+level, or on a per-database basis. The relevant parameters are *q* and
+*n*.
 
-.. code-block:: bash
+*q* is the number of database shards to maintain. *n* is the number of
+copies of each document to distribute. With q=8, the database is split
+into 8 shards. With n=3, the cluster distributes three replicas of each
+shard. Altogether, that's 24 shards for a single database. In a default
+3-node cluster, each node would receive 8 shards. In a 4-node cluster,
+each node would receive 6 shards.
 
-    curl -X PUT "http://xxx.xxx.xxx.xxx:5984/small?n=1&q=2"; --user daboss
+CouchDB nodes have a ``etc/default.ini`` file with a section named
+``[cluster]`` which looks like this:
 
-If you look in the directory ``data/shards`` you will find the 2 shards.
+::
 
-.. code-block:: text
+    [cluster]
+    q=8
+    n=3
 
-    data/
-    +-- shards/
-    |   +-- 00000000-7fffffff/
-    |   |    -- small.1425202577.couch
-    |   +-- 80000000-ffffffff/
-    |        -- small.1425202577.couch
+These settings can be modified to set sharding defaults for all
+databases, or they can be set on a per-database basis by specifying the
+``q`` and ``n`` query parameters when the database is created. For
+example:
 
-Now, check the node-local ``_dbs_`` database. Here, the metadata for each
-database is stored. As the database is called ``small``, there is a document
-called ``small`` there. Let us look in it. Yes, you can get it with curl too:
+.. code:: bash
 
-.. code-block:: javascript
+    $ curl -X PUT "$COUCH_URL/database-name?q=4&n=2"
 
-    curl -X GET "http://xxx.xxx.xxx.xxx:5986/_dbs/small";
+That creates a database that is split into 4 shards and 2 replicas,
+yielding 8 shards distributed throughout the cluster.
+
+Quorum
+------
+
+When a CouchDB cluster serves reads and writes, it proxies the request
+to nodes with relevant shards and responds once enough nodes have
+responded to establish
+`quorum <https://en.wikipedia.org/wiki/Quorum_(distributed_computing)>`__.
+The size of the required quorum can be configured at request time by
+setting the ``r`` parameter for document and view reads, and the ``w``
+parameter for document writes. For example, here is a request that
+specifies that at least two nodes must respond in order to establish
+quorum:
+
+.. code:: bash
+
+    $ curl "$COUCH_URL:5984/{docId}?r=2"
+
+Here is a similar example for writing a document:
+
+.. code:: bash
+
+    $ curl -X PUT "$COUCH_URL:5984/{docId}?w=2" -d '{}'
+
+Setting ``r`` or ``w`` to be equal to ``n`` (the number of replicas)
+means you will only receive a response once all nodes with relevant
+shards have responded, however even this does not guarantee `ACIDic
+consistency <https://en.wikipedia.org/wiki/ACID#Consistency>`__. Setting
+``r`` or ``w`` to 1 means you will receive a response after only one
+relevant node has responded.
+
+Adding a node
+-------------
+
+To add a node to a cluster, first you must have the additional node
+running somewhere. Make note of the address it binds to, like
+``127.0.0.1``, then ``PUT`` an empty document to the ``/_node``
+endpoint:
+
+.. code:: bash
+
+    $ curl -X PUT "$COUCH_URL:5984/_node/{name}@{address}" -d '{}'
+
+This will add the node to the cluster. Existing shards will not be moved
+or re-balanced in response to the addition, but future operations will
+distribute shards to the new node.
+
+Now when you GET the ``/_membership`` endpoint, you will see the new
+node.
+
+Removing a node
+---------------
+
+To remove a node from the cluster, you must first acquire the ``_rev``
+value for the document that signifies its existence:
+
+.. code:: bash
+
+    $ curl "$COUCH_URL:5984/_node/{name}@{address}"
+    {"_id":"{name}@{address}","_rev":"{rev}"}
+
+Using that ``_rev``, you can delete the node using the ``/_node``
+endpoint:
+
+.. code:: bash
+
+    $ curl -X DELETE "$COUCH_URL:5984/_node/{name}@{address}?rev={rev}"
+
+.. raw:: html
+
+   <div class="alert alert-warning">
+
+**Note**: Before you remove a node, make sure to
+`move its shards <#moving-a-shard>`__
+or else they will be lost.
+
+Moving a shard
+--------------
+
+Moving shards between nodes involves the following steps:
+
+1. Copy the shard file onto the new node.
+2. Update cluster metadata to reflect the move.
+3. Replicate from the old to the new to catch any changes.
+4. Delete the old shard file.
+
+Copying shard files
+~~~~~~~~~~~~~~~~~~~
+
+Shard files live in the ``data/shards`` directory of your CouchDB
+install. Since they are just files, you can use ``cp``, ``rsync``,
+``scp`` or other command to copy them from one node to another. For
+example:
+
+.. code:: bash
+
+    # one one machine
+    mkdir -p data/shards/{range}
+    # on the other
+    scp $COUCH_PATH/data/shards/{range}/{database}.{timestamp}.couch 
$OTHER:$COUCH_PATH/data/shards/{range}/
+
+Views are also sharded, and their shards should be moved to save the new
+node the effort of rebuilding the view. View shards live in
+``data/.shards``.
+
+Updating cluster metadata
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To update the cluster metadata, use the special node-specific ``/_dbs``
+database, accessible via a node's private port, usually at port 5986.
 
 Review comment:
   +"This port is only available on the localhost interface for security 
purposes.
   
   and a note to you: this will hopefully change in 3.0 when port 5986 is 
dropped completely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] wohali commented on a change in pull request #268: Rewrite sharding documentation

Reply via email to