Repository: incubator-atlas
Updated Branches:
  refs/heads/branch-0.6-incubating 5f0a5ca22 -> dc2ca4520


ATLAS-374 Doc: Create a wiki for documenting fault tolerance and HA options for 
Atlas data (yhmenath via sumasai)


Project: http://git-wip-us.apache.org/repos/asf/incubator-atlas/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-atlas/commit/dc2ca452
Tree: http://git-wip-us.apache.org/repos/asf/incubator-atlas/tree/dc2ca452
Diff: http://git-wip-us.apache.org/repos/asf/incubator-atlas/diff/dc2ca452

Branch: refs/heads/branch-0.6-incubating
Commit: dc2ca4520e72a76322b657d37b26394f8306a8e2
Parents: 5f0a5ca
Author: Suma Shivaprasad <[email protected]>
Authored: Mon Dec 14 17:00:57 2015 +0530
Committer: Suma Shivaprasad <[email protected]>
Committed: Mon Dec 14 17:00:57 2015 +0530

----------------------------------------------------------------------
 docs/src/site/twiki/Architecture.twiki      |  8 +++
 docs/src/site/twiki/HighAvailability.twiki  | 89 ++++++++++++++++++++++++
 docs/src/site/twiki/InstallationSteps.twiki |  9 ++-
 docs/src/site/twiki/index.twiki             |  1 +
 release-log.txt                             |  1 +
 5 files changed, 107 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/dc2ca452/docs/src/site/twiki/Architecture.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Architecture.twiki 
b/docs/src/site/twiki/Architecture.twiki
index cb0e208..6896e85 100755
--- a/docs/src/site/twiki/Architecture.twiki
+++ b/docs/src/site/twiki/Architecture.twiki
@@ -5,6 +5,14 @@
 ---++ Atlas High Level Architecture - Overview
 <img src="images/twiki/architecture.png" height="400" width="600" />
 
+Architecturally, Atlas has the following components:
+
+   * *A Web service*: This exposes RESTful APIs and a Web user interface to 
create, update and query metadata.
+   * *Metadata store*: Metadata is modeled using a graph model, implemented 
using the Graph database Titan. Titan has options for a variety of backing 
stores for persisting the graph, including an embedded Berkeley DB, Apache 
HBase and Apache Cassandra. The choice of the backing store determines the 
level of service availability.
+   * *Index store*: For powering full text searches on metadata, Atlas also 
indexes the metadata, again via Titan. The backing store for the full text 
search is a search backend like !ElasticSearch or Apache Solr.
+   * *Bridges / Hooks*: To add metadata to Atlas, libraries called ‘hooks’ 
are enabled in various systems like Apache Hive, Apache Falcon and Apache Sqoop 
which capture metadata events in the respective systems and propagate those 
events to Atlas. The Atlas server consumes these events and updates its stores.
+   * *Metadata notification events*: Any updates to metadata in Atlas, either 
via the Hooks or the API are propagated from Atlas to downstream systems via 
events. Systems like Apache Ranger consume these events and allow 
administrators to act on them, for e.g. to configure policies for Access 
control.
+   * *Notification Server*: Atlas uses Apache Kafka as a notification server 
for communication between hooks and downstream consumers of metadata 
notification events. Events are written by the hooks and Atlas to different 
Kafka topics. Kafka enables a loosely coupled integration between these 
disparate systems.
 
 ---++ Bridges
 External components like hive/sqoop/storm/falcon should model their taxonomy 
using typesystem and register the types with Atlas. For every entity created in 
this external component, the corresponding entity should be registered in Atlas 
as well.

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/dc2ca452/docs/src/site/twiki/HighAvailability.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/HighAvailability.twiki 
b/docs/src/site/twiki/HighAvailability.twiki
new file mode 100644
index 0000000..dac1644
--- /dev/null
+++ b/docs/src/site/twiki/HighAvailability.twiki
@@ -0,0 +1,89 @@
+---+ Fault Tolerance and High Availability Options
+
+---++ Introduction
+
+Apache Atlas uses and interacts with a variety of systems to provide metadata 
management and data lineage to data
+administrators. By choosing and configuring these dependencies appropriately, 
it is possible to achieve a good degree
+of service availability with Atlas. This document describes the state of high 
availability support in Atlas,
+including its capabilities and current limitations, and also the configuration 
required for achieving a this level of
+high availability.
+
+[[Architecture][The architecture page]] in the wiki gives an overview of the 
various components that make up Atlas.
+The options mentioned below for various components derive context from the 
above page, and would be worthwhile to
+review before proceeding to read this page.
+
+---++ Atlas Web Service
+
+Currently, the Atlas Web service has a limitation that it can only have one 
active instance at a time. Therefore, in
+case of errors to the host running the service, a new Atlas web service 
instance should be brought up and pointed to
+from the clients. In future versions of the system, we plan to provide full 
High Availability of the service, thereby
+enabling hot failover. To minimize service loss, we recommend the following:
+
+   * An extra physical host with the Atlas system software and configuration 
is available to be brought up on demand.
+   * It would be convenient to have the web service fronted by a proxy 
solution like 
[[https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#5.2][HAProxy]] 
which can be used to provide both the monitoring and transparent switching of 
the backend instance clients talk to.
+      * An example HAProxy configuration of this form will allow a transparent 
failover to a backup server:
+      <verbatim>
+      listen atlas
+        bind <proxy hostname>:<proxy port>
+        balance roundrobin
+        server inst1 <atlas server hostname>:<port> check
+        server inst2 <atlas backup server hostname>:<port> check backup
+      </verbatim>
+   * The stores that hold Atlas data can be configured to be highly available 
as described below.
+
+---++ Metadata Store
+
+As described above, Atlas uses Titan to store the metadata it manages. By 
default, Titan uses BerkeleyDB as an embedded
+backing store. However, this option would result in loss of data if the node 
running the Atlas server fails. In order
+to provide HA for the metadata store, we recommend that Atlas be configured to 
use HBase as the backing store for Titan.
+Doing this implies that you could benefit from the HA guarantees HBase 
provides. In order to configure Atlas to use
+HBase in HA mode, do the following:
+
+   * Choose an existing HBase cluster that is set up in HA mode to configure 
in Atlas (OR) Set up a new HBase cluster in 
[[http://hbase.apache.org/book.html#quickstart_fully_distributed][HA mode]].
+      * If setting up HBase for Atlas, please following instructions listed 
for setting up HBase in the [[InstallationSteps][Installation Steps]].
+   * We recommend using more than one HBase masters (at least 2) in the 
cluster on different physical hosts that use Zookeeper for coordination to 
provide redundancy and high availability of HBase.
+      * Refer to the [[Configuration][Configuration page]] for the options to 
configure in atlas.properties to setup Atlas with HBase.
+
+---++ Index Store
+
+As described above, Atlas indexes metadata through Titan to support full text 
search queries. In order to provide HA
+for the index store, we recommend that Atlas be configured to use Solr as the 
backing index store for Titan. In order
+to configure Atlas to use Solr in HA mode, do the following:
+
+   * Choose an existing !SolrCloud cluster setup in HA mode to configure in 
Atlas (OR) Set up a new 
[[https://cwiki.apache.org/confluence/display/solr/SolrCloud][SolrCloud 
cluster]].
+      * Ensure Solr is brought up on at least 2 physical hosts for redundancy, 
and each host runs a Solr node.
+      * We recommend the number of replicas to be set to at least 2 for 
redundancy.
+   * Create the !SolrCloud collections required by Atlas, as described in 
[[InstallationSteps][Installation Steps]]
+   * Refer to the [[Configuration][Configuration page]] for the options to 
configure in atlas.properties to setup Atlas with Solr.
+
+---++ Notification Server
+
+Metadata notification events from Hooks are sent to Atlas by writing them to a 
Kafka topic called *ATLAS_HOOK*. Similarly, events from
+Atlas to other integrating components like Ranger, are written to a Kafka 
topic called *ATLAS_ENTITIES*. Since Kafka
+persists these messages, the events will not be lost even if the consumers are 
down as the events are being sent. In
+addition, we recommend Kafka is also setup for fault tolerance so that it has 
higher availability guarantees. In order
+to configure Atlas to use Kafka in HA mode, do the following:
+
+   * Choose an existing Kafka cluster set up in HA mode to configure in Atlas 
(OR) Set up a new Kafka cluster.
+   * We recommend that there are more than one Kafka brokers in the cluster on 
different physical hosts that use Zookeeper for coordination to provide 
redundancy and high availability of Kafka.
+      * Setup at least 2 physical hosts for redundancy, each hosting a Kafka 
broker.
+   * Set up Kafka topics for Atlas usage:
+      * The number of partitions for the ATLAS topics should be set to 1 
(numPartitions)
+      * Decide number of replicas for Kafka topic: Set this to at least 2 for 
redundancy.
+      * Run the following commands:
+      <verbatim>
+      $KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper <list of zookeeper 
host:port entries> --topic ATLAS_HOOK --replication-factor <numReplicas> 
--partitions 1
+      $KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper <list of zookeeper 
host:port entries> --topic ATLAS_ENTITIES --replication-factor <numReplicas> 
--partitions 1
+      Here KAFKA_HOME points to the Kafka installation directory.
+      </verbatim>
+   * In application.properties, set the following configuration:
+     <verbatim>
+     atlas.notification.embedded=false
+     atlas.kafka.zookeeper.connect=<comma separated list of servers forming 
Zookeeper quorum used by Kafka>
+     atlas.kafka.bootstrap.servers=<comma separated list of Kafka broker 
endpoints in host:port form> - Give at least 2 for redundancy.
+     </verbatim>
+
+---++ Known Issues
+
+   * [[https://issues.apache.org/jira/browse/ATLAS-338][ATLAS-338]]: 
ATLAS-338: Metadata events generated from a Hive CLI (as opposed to Beeline or 
any client going HiveServer2) would be lost if Atlas server is down.
+   * If the HBase region servers hosting the Atlas ‘titan’ HTable are 
down, Atlas would not be able to store or retrieve metadata from HBase until 
they are brought back online.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/dc2ca452/docs/src/site/twiki/InstallationSteps.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/InstallationSteps.twiki 
b/docs/src/site/twiki/InstallationSteps.twiki
index 02a62f5..64aad85 100644
--- a/docs/src/site/twiki/InstallationSteps.twiki
+++ b/docs/src/site/twiki/InstallationSteps.twiki
@@ -143,6 +143,11 @@ For configuring Titan to work with Solr, please follow the 
instructions below
   For a small cluster, running with an existing ZooKeeper quorum should be 
fine. For larger clusters, you would want to run separate multiple ZooKeeper 
quorum with atleast 3 servers.
   Note: Atlas currently supports solr in "cloud" mode only. "http" mode is not 
supported. For more information, refer solr documentation - 
https://cwiki.apache.org/confluence/display/solr/SolrCloud
 
+* For e.g., to bring up a Solr node listening on port 8983 on a machine, you 
can use the command:
+      <verbatim>
+      $SOLR_HOME/bin/solr start -c -z <zookeeper_host:port> -p 8983
+      </verbatim>
+
 * Run the following commands from SOLR_HOME directory to create collections in 
Solr corresponding to the indexes that Atlas uses. In the case that the ATLAS 
and SOLR instance are on 2 different hosts,
   first copy the required configuration files from ATLAS_HOME/conf/solr on the 
ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned 
commands refer to the directory where the solr configuration files
   have been copied to on Solr host:
@@ -153,7 +158,9 @@ For configuring Titan to work with Solr, please follow the 
instructions below
 
   Note: If numShards and replicationFactor are not specified, they default to 
1 which suffices if you are trying out solr with ATLAS on a single node 
instance.
   Otherwise specify numShards according to the number of hosts that are in the 
Solr cluster and the maxShardsPerNode configuration.
-  The number of shards cannot exceed the total number of Solr nodes in your 
SolrCloud cluster
+  The number of shards cannot exceed the total number of Solr nodes in your 
SolrCloud cluster.
+
+  The number of replicas (replicationFactor) can be set according to the 
redundancy required.
 
 * Change ATLAS configuration to point to the Solr instance setup. Please make 
sure the following configurations are set to the below values in 
ATLAS_HOME//conf/application.properties
  atlas.graph.index.search.backend=solr5

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/dc2ca452/docs/src/site/twiki/index.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/index.twiki b/docs/src/site/twiki/index.twiki
index 53b7552..a921b11 100755
--- a/docs/src/site/twiki/index.twiki
+++ b/docs/src/site/twiki/index.twiki
@@ -47,6 +47,7 @@ allows integration with the whole enterprise data ecosystem.
       * [[Notification-Entity][Entity Notification]]
    * Bridges
       * [[Bridge-Hive][Hive Bridge]]
+   * [[HighAvailability][Fault Tolerance And High Availability Options]]
 
 
 ---++ API Documentation

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/dc2ca452/release-log.txt
----------------------------------------------------------------------
diff --git a/release-log.txt b/release-log.txt
index b68d3dc..f058148 100644
--- a/release-log.txt
+++ b/release-log.txt
@@ -9,6 +9,7 @@ ATLAS-54 Rename configs in hive hook (shwethags)
 ATLAS-3 Mixed Index creation fails with Date types (sumasai via shwethags)
 
 ALL CHANGES:
+ATLAS-374 Doc: Create a wiki for documenting fault tolerance and HA options 
for Atlas data (yhemath via sumasai)
 ATLAS-346 Atlas server loses messages sent from Hive hook if restarted after 
unclean shutdown(yhemath via sumasai)
 ATLAS-382 Fixed Hive Bridge doc for ATLAS cluster name (sumasai)
 ATLAS-244 UI: Add Tag Tab (darshankumar89 via sumasai)

Reply via email to