ATLAS-2365: updated README for 1.0.0-alpha release

Signed-off-by: kevalbhatt <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/atlas/repo
Commit: http://git-wip-us.apache.org/repos/asf/atlas/commit/c65586f1
Tree: http://git-wip-us.apache.org/repos/asf/atlas/tree/c65586f1
Diff: http://git-wip-us.apache.org/repos/asf/atlas/diff/c65586f1

Branch: refs/heads/master
Commit: c65586f13896a44eb400c45c084499ab121c2e59
Parents: 39be2cc
Author: Madhan Neethiraj <[email protected]>
Authored: Fri Jan 19 15:49:44 2018 +0530
Committer: kevalbhatt <[email protected]>
Committed: Fri Jan 19 15:49:44 2018 +0530

----------------------------------------------------------------------
 docs/pom.xml                                |   3 +
 docs/src/site/twiki/Architecture.twiki      |  30 +-
 docs/src/site/twiki/Bridge-Falcon.twiki     |  56 ++--
 docs/src/site/twiki/Bridge-Hive.twiki       | 117 ++++----
 docs/src/site/twiki/Bridge-Sqoop.twiki      |  45 +--
 docs/src/site/twiki/Configuration.twiki     | 226 ++++-----------
 docs/src/site/twiki/HighAvailability.twiki  |  12 +-
 docs/src/site/twiki/InstallationSteps.twiki | 341 +++++++++--------------
 docs/src/site/twiki/QuickStart.twiki        |   7 +-
 docs/src/site/twiki/Repository.twiki        |   4 -
 docs/src/site/twiki/TypeSystem.twiki        | 206 +++++++-------
 docs/src/site/twiki/index.twiki             |  46 +--
 docs/src/site/twiki/security.twiki          |   2 +-
 13 files changed, 451 insertions(+), 644 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/pom.xml
----------------------------------------------------------------------
diff --git a/docs/pom.xml b/docs/pom.xml
index 15c1c38..1e38757 100755
--- a/docs/pom.xml
+++ b/docs/pom.xml
@@ -77,6 +77,9 @@
                         <version>1.6</version>
                     </dependency>
                 </dependencies>
+                <configuration>
+                                   <port>8080</port>
+                </configuration>
                 <executions>
                     <execution>
                         <goals>

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/Architecture.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Architecture.twiki 
b/docs/src/site/twiki/Architecture.twiki
index c832500..d0f1a05 100755
--- a/docs/src/site/twiki/Architecture.twiki
+++ b/docs/src/site/twiki/Architecture.twiki
@@ -8,8 +8,7 @@
 The components of Atlas can be grouped under the following major categories:
 
 ---+++ Core
-
-This category contains the components that implement the core of Atlas 
functionality, including:
+Atlas core includes the following components:
 
 *Type System*: Atlas allows users to define a model for the metadata objects 
they want to manage. The model is composed
 of definitions called ‘types’. Instances of ‘types’ called 
‘entities’ represent the actual metadata objects that are
@@ -21,25 +20,18 @@ One key point to note is that the generic nature of the 
modelling in Atlas allow
 define both technical metadata and business metadata. It is also possible to 
define rich relationships between the
 two using features of Atlas.
 
+*Graph Engine*: Internally, Atlas persists metadata objects it manages using a 
Graph model. This approach provides great
+flexibility and enables efficient handling of rich relationships between the 
metadata objects. Graph engine component is
+responsible for translating between types and entities of the Atlas type 
system, and the underlying graph persistence model.
+In addition to managing the graph objects, the graph engine also creates the 
appropriate indices for the metadata
+objects so that they can be searched efficiently. Atlas uses the JanusGraph to 
store the metadata objects.
+
 *Ingest / Export*: The Ingest component allows metadata to be added to Atlas. 
Similarly, the Export component exposes
 metadata changes detected by Atlas to be raised as events. Consumers can 
consume these change events to react to
 metadata changes in real time.
 
-*Graph Engine*: Internally, Atlas represents metadata objects it manages using 
a Graph model. It does this to
-achieve great flexibility and rich relations between the metadata objects. The 
Graph Engine is a component that is
-responsible for translating between types and entities of the Type System, and 
the underlying Graph model.
-In addition to managing the Graph objects, The Graph Engine also creates the 
appropriate indices for the metadata
-objects so that they can be searched for efficiently.
-
-*Titan*: Currently, Atlas uses the Titan Graph Database to store the metadata 
objects. Titan is used as a library
-within Atlas. Titan uses two stores: The Metadata store is configured to 
!HBase by default and the Index store
-is configured to Solr. It is also possible to use the Metadata store as 
BerkeleyDB and Index store as !ElasticSearch
-by building with corresponding profiles. The Metadata store is used for 
storing the metadata objects proper, and the
-Index store is used for storing indices of the Metadata properties, that 
allows efficient search.
-
 
 ---+++ Integration
-
 Users can manage metadata in Atlas using two methods:
 
 *API*: All functionality of Atlas is exposed to end users via a REST API that 
allows types and entities to be created,
@@ -53,7 +45,6 @@ uses Apache Kafka as a notification server for communication 
between hooks and d
 notification events. Events are written by the hooks and Atlas to different 
Kafka topics.
 
 ---+++ Metadata sources
-
 Atlas supports integration with many sources of metadata out of the box. More 
integrations will be added in future
 as well. Currently, Atlas supports ingesting and managing metadata from the 
following sources:
 
@@ -61,6 +52,7 @@ as well. Currently, Atlas supports ingesting and managing 
metadata from the foll
    * [[Bridge-Sqoop][Sqoop]]
    * [[Bridge-Falcon][Falcon]]
    * [[StormAtlasHook][Storm]]
+   * HBase - _documentation work-in-progress_
 
 The integration implies two things:
 There are metadata models that Atlas defines natively to represent objects of 
these components.
@@ -80,12 +72,6 @@ for the Hadoop ecosystem having wide integration with a 
variety of Hadoop compon
 Ranger allows security administrators to define metadata driven security 
policies for effective governance.
 Ranger is a consumer to the metadata change events notified by Atlas.
 
-*Business Taxonomy*: The metadata objects ingested into Atlas from the 
Metadata sources are primarily a form
-of technical metadata. To enhance the discoverability and governance 
capabilities, Atlas comes with a Business
-Taxonomy interface that allows users to first, define a hierarchical set of 
business terms that represent their
-business domain and associate them to the metadata entities Atlas manages. 
Business Taxonomy is a web application that
-is part of the Atlas Admin UI currently and integrates with Atlas using the 
REST API.
-
 
 
 

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/Bridge-Falcon.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Bridge-Falcon.twiki 
b/docs/src/site/twiki/Bridge-Falcon.twiki
index de80035..0cf1645 100644
--- a/docs/src/site/twiki/Bridge-Falcon.twiki
+++ b/docs/src/site/twiki/Bridge-Falcon.twiki
@@ -1,44 +1,52 @@
 ---+ Falcon Atlas Bridge
 
 ---++ Falcon Model
-The default falcon modelling is available in 
org.apache.atlas.falcon.model.FalconDataModelGenerator. It defines the 
following types:
-<verbatim>
-falcon_cluster(ClassType) - super types [Infrastructure] - attributes 
[timestamp, colo, owner, tags]
-falcon_feed(ClassType) - super types [DataSet] - attributes [timestamp, 
stored-in, owner, groups, tags]
-falcon_feed_creation(ClassType) - super types [Process] - attributes 
[timestamp, stored-in, owner]
-falcon_feed_replication(ClassType) - super types [Process] - attributes 
[timestamp, owner]
-falcon_process(ClassType) - super types [Process] - attributes [timestamp, 
runs-on, owner, tags, pipelines, workflow-properties]
-</verbatim>
+The default hive model includes the following types:
+   * Entity types:
+      * falcon_cluster
+         * super-types: Infrastructure
+         * attributes: timestamp, colo, owner, tags
+      * falcon_feed
+         * super-types: !DataSet
+         * attributes: timestamp, stored-in, owner, groups, tags
+      * falcon_feed_creation
+         * super-types: Process
+         * attributes: timestamp, stored-in, owner
+      * falcon_feed_replication
+         * super-types: Process
+         * attributes: timestamp, owner
+      * falcon_process
+         * super-types: Process
+         * attributes: timestamp, runs-on, owner, tags, pipelines, 
workflow-properties
 
 One falcon_process entity is created for every cluster that the falcon process 
is defined for.
 
 The entities are created and de-duped using unique qualifiedName attribute. 
They provide namespace and can be used for querying/lineage as well. The unique 
attributes are:
-   * falcon_process - <process name>@<cluster name>
-   * falcon_cluster - <cluster name>
-   * falcon_feed - <feed name>@<cluster name>
-   * falcon_feed_creation - <feed name>
-   * falcon_feed_replication - <feed name>
+   * falcon_process.qualifiedName          - <process name>@<cluster name>
+   * falcon_cluster.qualifiedName          - <cluster name>
+   * falcon_feed.qualifiedName             - <feed name>@<cluster name>
+   * falcon_feed_creation.qualifiedName    - <feed name>
+   * falcon_feed_replication.qualifiedName - <feed name>
 
 ---++ Falcon Hook
-Falcon supports listeners on falcon entity submission. This is used to add 
entities in Atlas using the model defined in 
org.apache.atlas.falcon.model.FalconDataModelGenerator.
-The hook submits the request to a thread pool executor to avoid blocking the 
command execution. The thread submits the entities as message to the 
notification server and atlas server reads these messages and registers the 
entities.
+Falcon supports listeners on falcon entity submission. This is used to add 
entities in Atlas using the model detailed above.
+Follow the instructions below to setup Atlas hook in Falcon:
    * Add 'org.apache.atlas.falcon.service.AtlasService' to 
application.services in <falcon-conf>/startup.properties
-   * Link falcon hook jars in falcon classpath - 'ln -s 
<atlas-home>/hook/falcon/* <falcon-home>/server/webapp/falcon/WEB-INF/lib/'
+   * Link Atlas hook jars in Falcon classpath - 'ln -s 
<atlas-home>/hook/falcon/* <falcon-home>/server/webapp/falcon/WEB-INF/lib/'
    * In <falcon_conf>/falcon-env.sh, set an environment variable as follows:
      <verbatim>
-     export FALCON_SERVER_OPTS="<atlas_home>/hook/falcon/*:$FALCON_SERVER_OPTS"
-     </verbatim>
+     export 
FALCON_SERVER_OPTS="<atlas_home>/hook/falcon/*:$FALCON_SERVER_OPTS"</verbatim>
 
 The following properties in <atlas-conf>/atlas-application.properties control 
the thread pool and notification details:
-   * atlas.hook.falcon.synchronous - boolean, true to run the hook 
synchronously. default false
-   * atlas.hook.falcon.numRetries - number of retries for notification 
failure. default 3
-   * atlas.hook.falcon.minThreads - core number of threads. default 5
-   * atlas.hook.falcon.maxThreads - maximum number of threads. default 5
+   * atlas.hook.falcon.synchronous   - boolean, true to run the hook 
synchronously. default false
+   * atlas.hook.falcon.numRetries    - number of retries for notification 
failure. default 3
+   * atlas.hook.falcon.minThreads    - core number of threads. default 5
+   * atlas.hook.falcon.maxThreads    - maximum number of threads. default 5
    * atlas.hook.falcon.keepAliveTime - keep alive time in msecs. default 10
-   * atlas.hook.falcon.queueSize - queue size for the threadpool. default 10000
+   * atlas.hook.falcon.queueSize     - queue size for the threadpool. default 
10000
 
 Refer [[Configuration][Configuration]] for notification related configurations
 
 
----++ Limitations
+---++ NOTES
    * In falcon cluster entity, cluster name used should be uniform across 
components like hive, falcon, sqoop etc. If used with ambari, ambari cluster 
name should be used for cluster entity

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/Bridge-Hive.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Bridge-Hive.twiki 
b/docs/src/site/twiki/Bridge-Hive.twiki
index dd22b5c..7c93ecd 100644
--- a/docs/src/site/twiki/Bridge-Hive.twiki
+++ b/docs/src/site/twiki/Bridge-Hive.twiki
@@ -1,73 +1,71 @@
 ---+ Hive Atlas Bridge
 
 ---++ Hive Model
-The default hive modelling is available in 
org.apache.atlas.hive.model.HiveDataModelGenerator. It defines the following 
types:
-<verbatim>
-hive_db(ClassType) - super types [Referenceable] - attributes [name, 
clusterName, description, locationUri, parameters, ownerName, ownerType]
-hive_storagedesc(ClassType) - super types [Referenceable] - attributes [cols, 
location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, 
bucketCols, sortCols, parameters, storedAsSubDirectories]
-hive_column(ClassType) - super types [Referenceable] - attributes [name, type, 
comment, table]
-hive_table(ClassType) - super types [DataSet] - attributes [name, db, owner, 
createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, 
aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary]
-hive_process(ClassType) - super types [Process] - attributes [name, startTime, 
endTime, userName, operationType, queryText, queryPlan, queryId]
-hive_principal_type(EnumType) - values [USER, ROLE, GROUP]
-hive_order(StructType) - attributes [col, order]
-hive_serde(StructType) - attributes [name, serializationLib, parameters]
-</verbatim>
-
-The entities are created and de-duped using unique qualified name. They 
provide namespace and can be used for querying/lineage as well. Note that  
dbName, tableName and columnName should be in lower case. clusterName is 
explained below.
-   * hive_db - attribute qualifiedName - <dbName>@<clusterName>
-   * hive_table - attribute qualifiedName - <dbName>.<tableName>@<clusterName>
-   * hive_column - attribute qualifiedName - 
<dbName>.<tableName>.<columnName>@<clusterName>
-   * hive_process - attribute name - <queryString> - trimmed query string in 
lower case
+The default hive model includes the following types:
+   * Entity types:
+      * hive_db
+         * super-types: Referenceable
+         * attributes: name, clusterName, description, locationUri, 
parameters, ownerName, ownerType
+      * hive_storagedesc
+         * super-types: Referenceable
+         * attributes: cols, location, inputFormat, outputFormat, compressed, 
numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories
+      * hive_column
+         * super-types: Referenceable
+         * attributes: name, type, comment, table
+      * hive_table
+         * super-types: !DataSet
+         * attributes: name, db, owner, createTime, lastAccessTime, comment, 
retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, 
viewExpandedText, tableType, temporary
+      * hive_process
+         * super-types: Process
+         * attributes: name, startTime, endTime, userName, operationType, 
queryText, queryPlan, queryId
+      * hive_column_lineage
+         * super-types: Process
+         * attributes: query, depenendencyType, expression
+
+   * Enum types:
+      * hive_principal_type
+         * values: USER, ROLE, GROUP
+
+   * Struct types:
+      * hive_order
+         * attributes: col, order
+      * hive_serde
+         * attributes: name, serializationLib, parameters
+
+The entities are created and de-duped using unique qualified name. They 
provide namespace and can be used for querying/lineage as well. Note that 
dbName, tableName and columnName should be in lower case. clusterName is 
explained below.
+   * hive_db.qualifiedName     - <dbName>@<clusterName>
+   * hive_table.qualifiedName  - <dbName>.<tableName>@<clusterName>
+   * hive_column.qualifiedName - 
<dbName>.<tableName>.<columnName>@<clusterName>
+   * hive_process.queryString  - trimmed query string in lower case
 
 
 ---++ Importing Hive Metadata
-org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the Hive metadata 
into Atlas using the model defined in 
org.apache.atlas.hive.model.HiveDataModelGenerator. import-hive.sh command can 
be used to facilitate this. The script needs Hadoop and Hive classpath jars.
-  * For Hadoop jars, please make sure that the environment variable 
HADOOP_CLASSPATH is set. Another way is to set HADOOP_HOME to point to root 
directory of your Hadoop installation
-  * Similarly, for Hive jars, set HIVE_HOME to the root of Hive installation
-  * Set environment variable HIVE_CONF_DIR to Hive configuration directory
-  * Copy <atlas-conf>/atlas-application.properties to the hive conf directory
-
+org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the Hive metadata 
into Atlas using the model defined above. import-hive.sh command can be used to 
facilitate this.
     <verbatim>
-    Usage: <atlas package>/hook-bin/import-hive.sh
-    </verbatim>
+    Usage: <atlas package>/hook-bin/import-hive.sh</verbatim>
 
 The logs are in <atlas package>/logs/import-hive.log
 
-If you you are importing metadata in a kerberized cluster you need to run the 
command like this:
-<verbatim>
-<atlas package>/hook-bin/import-hive.sh -Dsun.security.jgss.debug=true 
-Djavax.security.auth.useSubjectCredsOnly=false 
-Djava.security.krb5.conf=[krb5.conf location] 
-Djava.security.auth.login.config=[jaas.conf location]
-</verbatim>
-   * krb5.conf is typically found at /etc/krb5.conf
-   * for details about jaas.conf and a suggested location see the 
[[security][atlas security documentation]]
-
 
 ---++ Hive Hook
-Hive supports listeners on hive command execution using hive hooks. This is 
used to add/update/remove entities in Atlas using the model defined in 
org.apache.atlas.hive.model.HiveDataModelGenerator.
-The hook submits the request to a thread pool executor to avoid blocking the 
command execution. The thread submits the entities as message to the 
notification server and atlas server reads these messages and registers the 
entities.
-Follow these instructions in your hive set-up to add hive hook for Atlas:
-   * Set-up atlas hook in hive-site.xml of your hive configuration:
+Atlas Hive hook registers with Hive to listen for create/update/delete 
operations and updates the metadata in Atlas, via Kafka notifications, for the 
changes in Hive.
+Follow the instructions below to setup Atlas hook in Hive:
+   * Set-up Atlas hook in hive-site.xml by adding the following:
   <verbatim>
     <property>
       <name>hive.exec.post.hooks</name>
       <value>org.apache.atlas.hive.hook.HiveHook</value>
-    </property>
-  </verbatim>
-  <verbatim>
-    <property>
-      <name>atlas.cluster.name</name>
-      <value>primary</value>
-    </property>
-  </verbatim>
+    </property></verbatim>
    * Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh 
of your hive configuration
    * Copy <atlas-conf>/atlas-application.properties to the hive conf directory.
 
 The following properties in <atlas-conf>/atlas-application.properties control 
the thread pool and notification details:
-   * atlas.hook.hive.synchronous - boolean, true to run the hook 
synchronously. default false. Recommended to be set to false to avoid delays in 
hive query completion.
-   * atlas.hook.hive.numRetries - number of retries for notification failure. 
default 3
-   * atlas.hook.hive.minThreads - core number of threads. default 5
-   * atlas.hook.hive.maxThreads - maximum number of threads. default 5
+   * atlas.hook.hive.synchronous   - boolean, true to run the hook 
synchronously. default false. Recommended to be set to false to avoid delays in 
hive query completion.
+   * atlas.hook.hive.numRetries    - number of retries for notification 
failure. default 3
+   * atlas.hook.hive.minThreads    - core number of threads. default 1
+   * atlas.hook.hive.maxThreads    - maximum number of threads. default 5
    * atlas.hook.hive.keepAliveTime - keep alive time in msecs. default 10
-   * atlas.hook.hive.queueSize - queue size for the threadpool. default 10000
+   * atlas.hook.hive.queueSize     - queue size for the threadpool. default 
10000
 
 Refer [[Configuration][Configuration]] for notification related configurations
 
@@ -76,24 +74,23 @@ Refer [[Configuration][Configuration]] for notification 
related configurations
 Starting from 0.8-incubating version of Atlas, Column level lineage is 
captured in Atlas. Below are the details
 
 ---+++ Model
-   * !ColumnLineageProcess type is a subclass of Process
+   * !ColumnLineageProcess type is a subtype of Process
 
    * This relates an output Column to a set of input Columns or the Input Table
 
-   * The Lineage also captures the kind of Dependency: currently the values 
are SIMPLE, EXPRESSION, SCRIPT
-      * A SIMPLE dependency means the output column has the same value as the 
input
-      * An EXPRESSION dependency means the output column is transformed by 
some expression in the runtime(for e.g. a Hive SQL expression) on the Input 
Columns.
-      * SCRIPT means that the output column is transformed by a user provided 
script.
+   * The lineage also captures the kind of dependency, as listed below:
+      * SIMPLE:     output column has the same value as the input
+      * EXPRESSION: output column is transformed by some expression at runtime 
(for e.g. a Hive SQL expression) on the Input Columns.
+      * SCRIPT:     output column is transformed by a user provided script.
 
    * In case of EXPRESSION dependency the expression attribute contains the 
expression in string form
 
-   * Since Process links input and output !DataSets, we make Column a subclass 
of !DataSet
+   * Since Process links input and output !DataSets, Column is a subtype of 
!DataSet
 
 ---+++ Examples
 For a simple CTAS below:
 <verbatim>
-create table t2 as select id, name from T1
-</verbatim>
+create table t2 as select id, name from T1</verbatim>
 
 The lineage is captured as
 
@@ -106,10 +103,8 @@ The lineage is captured as
 
   * The !LineageInfo in Hive provides column-level lineage for the final 
!FileSinkOperator, linking them to the input columns in the Hive Query
 
----+++ NOTE
-Column level lineage works with Hive version 1.2.1 after the patch for <a 
href="https://issues.apache.org/jira/browse/HIVE-13112";>HIVE-13112</a> is 
applied to Hive source
-
----++ Limitations
+---++ NOTES
+   * Column level lineage works with Hive version 1.2.1 after the patch for <a 
href="https://issues.apache.org/jira/browse/HIVE-13112";>HIVE-13112</a> is 
applied to Hive source
    * Since database name, table name and column names are case insensitive in 
hive, the corresponding names in entities are lowercase. So, any search APIs 
should use lowercase while querying on the entity names
    * The following hive operations are captured by hive hook currently
       * create database

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/Bridge-Sqoop.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Bridge-Sqoop.twiki 
b/docs/src/site/twiki/Bridge-Sqoop.twiki
index bf942f2..480578b 100644
--- a/docs/src/site/twiki/Bridge-Sqoop.twiki
+++ b/docs/src/site/twiki/Bridge-Sqoop.twiki
@@ -1,37 +1,42 @@
 ---+ Sqoop Atlas Bridge
 
 ---++ Sqoop Model
-The default Sqoop modelling is available in 
org.apache.atlas.sqoop.model.SqoopDataModelGenerator. It defines the following 
types:
-<verbatim>
-sqoop_operation_type(EnumType) - values [IMPORT, EXPORT, EVAL]
-sqoop_dbstore_usage(EnumType) - values [TABLE, QUERY, PROCEDURE, OTHER]
-sqoop_process(ClassType) - super types [Process] - attributes [name, 
operation, dbStore, hiveTable, commandlineOpts, startTime, endTime, userName]
-sqoop_dbdatastore(ClassType) - super types [DataSet] - attributes [name, 
dbStoreType, storeUse, storeUri, source, description, ownerName]
-</verbatim>
+The default hive model includes the following types:
+   * Entity types:
+      * sqoop_process
+         * super-types: Process
+         * attributes: name, operation, dbStore, hiveTable, commandlineOpts, 
startTime, endTime, userName
+      * sqoop_dbdatastore
+         * super-types: !DataSet
+         * attributes: name, dbStoreType, storeUse, storeUri, source, 
description, ownerName
+
+   * Enum types:
+      * sqoop_operation_type
+         * values: IMPORT, EXPORT, EVAL
+      * sqoop_dbstore_usage
+         * values: TABLE, QUERY, PROCEDURE, OTHER
 
 The entities are created and de-duped using unique qualified name. They 
provide namespace and can be used for querying as well:
-sqoop_process - attribute name - sqoop-dbStoreType-storeUri-endTime
-sqoop_dbdatastore - attribute name - dbStoreType-connectorUrl-source
+   * sqoop_process.qualifiedName     - dbStoreType-storeUri-endTime
+   * sqoop_dbdatastore.qualifiedName - dbStoreType-storeUri-source
 
 ---++ Sqoop Hook
-Sqoop added a !SqoopJobDataPublisher that publishes data to Atlas after 
completion of import Job. Today, only hiveImport is supported in sqoopHook.
-This is used to add entities in Atlas using the model defined in 
org.apache.atlas.sqoop.model.SqoopDataModelGenerator.
-Follow these instructions in your sqoop set-up to add sqoop hook for Atlas in 
<sqoop-conf>/sqoop-site.xml:
+Sqoop added a !SqoopJobDataPublisher that publishes data to Atlas after 
completion of import Job. Today, only hiveImport is supported in !SqoopHook.
+This is used to add entities in Atlas using the model detailed above.
+
+Follow the instructions below to setup Atlas hook in Hive:
 
-   * Sqoop Job publisher class.  Currently only one publishing class is 
supported
+Add the following properties to  to enable Atlas hook in Sqoop:
+   * Set-up Atlas hook in <sqoop-conf>/sqoop-site.xml by adding the following:
+  <verbatim>
    <property>
      <name>sqoop.job.data.publish.class</name>
      <value>org.apache.atlas.sqoop.hook.SqoopHook</value>
-   </property>
-   * Atlas cluster name
-   <property>
-     <name>atlas.cluster.name</name>
-     <value><clustername></value>
-   </property>
+   </property></verbatim>
    * Copy <atlas-conf>/atlas-application.properties to to the sqoop conf 
directory <sqoop-conf>/
    * Link <atlas-home>/hook/sqoop/*.jar in sqoop lib
 
 Refer [[Configuration][Configuration]] for notification related configurations
 
----++ Limitations
+---++ NOTES
    * Only the following sqoop operations are captured by sqoop hook currently 
- hiveImport

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/Configuration.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Configuration.twiki 
b/docs/src/site/twiki/Configuration.twiki
index 19c39b0..63c3fce 100644
--- a/docs/src/site/twiki/Configuration.twiki
+++ b/docs/src/site/twiki/Configuration.twiki
@@ -5,139 +5,42 @@ All configuration in Atlas uses java properties style 
configuration. The main co
 
 ---++ Graph Configs
 
----+++ Graph persistence engine
-
-This section sets up the graph db - titan - to use a persistence engine. 
Please refer to
-<a 
href="http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html";>link</a>
 for more
-details. The example below uses BerkeleyDBJE.
-
-<verbatim>
-atlas.graph.storage.backend=berkeleyje
-atlas.graph.storage.directory=data/berkeley
-</verbatim>
-
----++++ Graph persistence engine - Hbase
-
-Basic configuration
+---+++ Graph Persistence engine - HBase
+Set the following properties to configure JanusGraph to use HBase as the 
persistence engine. Please refer to
+<a 
href="http://docs.janusgraph.org/0.2.0/configuration.html#_hbase_caching";>link</a>
 for more details.
 
 <verbatim>
 atlas.graph.storage.backend=hbase
-#For standalone mode , specify localhost
-#for distributed mode, specify zookeeper quorum here - For more information 
refer 
http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
 atlas.graph.storage.hostname=<ZooKeeper Quorum>
+atlas.graph.storage.hbase.table=atlas
 </verbatim>
 
-HBASE_CONF_DIR environment variable needs to be set to point to the Hbase 
client configuration directory which is added to classpath when Atlas starts up.
-hbase-site.xml needs to have the following properties set according to the 
cluster setup
-<verbatim>
-#Set below to /hbase-secure if the Hbase server is setup in secure mode
-zookeeper.znode.parent=/hbase-unsecure
-</verbatim>
+If any further JanusGraph configuration needs to be setup, please prefix the 
property name with "atlas.graph.".
 
-Advanced configuration
+In addition to setting up configurations, please ensure that environment 
variable HBASE_CONF_DIR is setup to point to
+the directory containing HBase configuration file hbase-site.xml.
 
-# If you are planning to use any of the configs mentioned below, they need to 
be prefixed with "atlas.graph." to take effect in ATLAS
-Refer 
http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_hbase
-
-Permissions
-
-When Atlas is configured with HBase as the storage backend the graph db 
(titan) needs sufficient user permissions to be able to create and access an 
HBase table.  In a secure cluster it may be necessary to grant permissions to 
the 'atlas' user for the 'titan' table.
-
-With Ranger, a policy can be configured for 'titan'.
-
-Without Ranger, HBase shell can be used to set the permissions.
+---+++ Graph Search Index - Solr
+Solr installation in Cloud mode is a prerequisite for Apache Atlas use. Set 
the following properties to configure JanusGraph to use Solr as the index 
search engine.
 
 <verbatim>
-   su hbase
-   kinit -k -t <hbase keytab> <hbase principal>
-   echo "grant 'atlas', 'RWXCA', 'titan'" | hbase shell
-</verbatim>
+atlas.graph.index.search.backend=solr5
+atlas.graph.index.search.solr.mode=cloud
+atlas.graph.index.search.solr.wait-searcher=true
 
-Note that if the embedded-hbase-solr profile is used then HBase is included in 
the distribution so that a standalone
-instance of HBase can be started as the default storage backend for the graph 
repository.  Using the embedded-hbase-solr
-profile will configure Atlas so that HBase instance will be started and 
stopped along with the Atlas server by default.
-To use the embedded-hbase-solr profile please see "Building Atlas" in the 
[[InstallationSteps][Installation Steps]]
-section.
+# ZK quorum setup for solr as comma separated value. Example: 
10.1.6.4:2181,10.1.6.5:2181
+atlas.graph.index.search.solr.zookeeper-url=
 
----+++ Graph Search Index
-This section sets up the graph db - titan - to use an search indexing system. 
The example
-configuration below sets up to use an embedded Elastic search indexing system.
+# SolrCloud Zookeeper Connection Timeout. Default value is 60000 ms
+atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
 
-<verbatim>
-atlas.graph.index.search.backend=elasticsearch
-atlas.graph.index.search.directory=data/es
-atlas.graph.index.search.elasticsearch.client-only=false
-atlas.graph.index.search.elasticsearch.local-mode=true
-atlas.graph.index.search.elasticsearch.create.sleep=2000
-</verbatim>
-
----++++ Graph Search Index - Solr
-Please note that Solr installation in Cloud mode is a prerequisite before 
configuring Solr as the search indexing backend. Refer InstallationSteps 
section for Solr installation/configuration.
-
-<verbatim>
- atlas.graph.index.search.backend=solr5
- atlas.graph.index.search.solr.mode=cloud
- atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as 
comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
- atlas.graph.index.search.solr.zookeeper-connect-timeout=<SolrCloud Zookeeper 
Connection Timeout>. Default value is 60000 ms
- atlas.graph.index.search.solr.zookeeper-session-timeout=<SolrCloud Zookeeper 
Session Timeout>. Default value is 60000 ms
-</verbatim>
-
-Also note that if the embedded-hbase-solr profile is used then Solr is 
included in the distribution so that a standalone
-instance of Solr can be started as the default search indexing backend. Using 
the embedded-hbase-solr profile will
-configure Atlas so that the standalone Solr instance will be started and 
stopped along with the Atlas server by default.
-To use the embedded-hbase-solr profile please see "Building Atlas" in the 
[[InstallationSteps][Installation Steps]]
-section.
-
----+++ Choosing between Persistence and Indexing Backends
-
-Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/bdb.html and 
http://s3.thinkaurelius.com/docs/titan/0.5.4/hbase.html for choosing between 
the persistence backends.
-BerkeleyDB is suitable for smaller data sets in the range of upto 10 million 
vertices with ACID gurantees.
-HBase on the other hand doesnt provide ACID guarantees but is able to scale 
for larger graphs. HBase also provides HA inherently.
-
----+++ Choosing between Persistence Backends
-
-Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/bdb.html and 
http://s3.thinkaurelius.com/docs/titan/0.5.4/hbase.html for choosing between 
the persistence backends.
-BerkeleyDB is suitable for smaller data sets in the range of upto 10 million 
vertices with ACID gurantees.
-HBase on the other hand doesnt provide ACID guarantees but is able to scale 
for larger graphs. HBase also provides HA inherently.
-
----+++ Choosing between Indexing Backends
-
-Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html and 
http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html for choosing between 
!ElasticSearch and Solr.
-Solr in cloud mode is the recommended setup.
-
----+++ Switching Persistence Backend
-
-For switching the storage backend from BerkeleyDB to HBase and vice versa, 
refer the documentation for "Graph Persistence Engine" described above and 
restart ATLAS.
-The data in the indexing backend needs to be cleared else there will be 
discrepancies between the storage and indexing backend which could result in 
errors during the search.
-!ElasticSearch runs by default in embedded mode and the data could easily be 
cleared by deleting the ATLAS_HOME/data/es directory.
-For Solr, the collections which were created during ATLAS Installation - 
vertex_index, edge_index, fulltext_index could be deleted which will cleanup 
the indexes
-
----+++ Switching Index Backend
-
-Switching the Index backend requires clearing the persistence backend data. 
Otherwise there will be discrepancies between the persistence and index 
backends since switching the indexing backend means index data will be lost.
-This leads to "Fulltext" queries not working on the existing data
-For clearing the data for BerkeleyDB, delete the ATLAS_HOME/data/berkeley 
directory
-For clearing the data for HBase, in Hbase shell, run 'disable titan' and 'drop 
titan'
-
-
----++ Lineage Configs
-
-The higher layer services like lineage, schema, etc. are driven by the type 
system and this section encodes the specific types for the hive data model.
-
-# This models reflects the base super types for Data and Process
-<verbatim>
-atlas.lineage.hive.table.type.name=DataSet
-atlas.lineage.hive.process.type.name=Process
-atlas.lineage.hive.process.inputs.name=inputs
-atlas.lineage.hive.process.outputs.name=outputs
-
-## Schema
-atlas.lineage.hive.table.schema.query=hive_table where name=?, columns
+# SolrCloud Zookeeper Session Timeout. Default value is 60000 ms
+atlas.graph.index.search.solr.zookeeper-session-timeout=60000
 </verbatim>
 
 
 ---++ Search Configs
-Search APIs (DSL and full text search) support pagination and have optional 
limit and offset arguments. Following configs are related to search pagination
+Search APIs (DSL, basic search, full-text search) support pagination and have 
optional limit and offset arguments. Following configs are related to search 
pagination
 
 <verbatim>
 # Default limit used when limit is not specified in API
@@ -152,53 +55,36 @@ atlas.search.maxlimit=10000
 Refer http://kafka.apache.org/documentation.html#configuration for Kafka 
configuration. All Kafka configs should be prefixed with 'atlas.kafka.'
 
 <verbatim>
-atlas.notification.embedded=true
-atlas.kafka.data=${sys:atlas.home}/data/kafka
-atlas.kafka.zookeeper.connect=localhost:9026
-atlas.kafka.bootstrap.servers=localhost:9027
-atlas.kafka.zookeeper.session.timeout.ms=400
-atlas.kafka.zookeeper.sync.time.ms=20
-atlas.kafka.auto.commit.interval.ms=1000
-atlas.kafka.hook.group.id=atlas
-</verbatim>
+atlas.kafka.auto.commit.enable=false
 
-Note that Kafka group ids are specified for a specific topic.  The Kafka group 
id configuration for entity notifications is 'atlas.kafka.entities.group.id'
+# Kafka servers. Example: localhost:6667
+atlas.kafka.bootstrap.servers=
 
-<verbatim>
-atlas.kafka.entities.group.id=<consumer id>
-</verbatim>
+atlas.kafka.hook.group.id=atlas
 
-These configuration parameters are useful for setting up Kafka topics via 
Atlas provided scripts, described in the
-[[InstallationSteps][Installation Steps]] page.
+# Zookeeper connect URL for Kafka. Example: localhost:2181
+atlas.kafka.zookeeper.connect=
 
-<verbatim>
-# Whether to create the topics automatically, default is true.
-# Comma separated list of topics to be created, default is 
"ATLAS_HOOK,ATLAS_ENTITES"
-atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
-# Number of replicas for the Atlas topics, default is 1. Increase for higher 
resilience to Kafka failures.
-atlas.notification.replicas=1
-# Enable the below two properties if Kafka is running in Kerberized mode.
-# Set this to the service principal representing the Kafka service
-atlas.notification.kafka.service.principal=kafka/[email protected]
-# Set this to the location of the keytab file for Kafka
-#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
-</verbatim>
+atlas.kafka.zookeeper.connection.timeout.ms=30000
+atlas.kafka.zookeeper.session.timeout.ms=60000
+atlas.kafka.zookeeper.sync.time.ms=20
 
-These configuration parameters are useful for saving messages in case there 
are issues in reaching Kafka for
-sending messages.
+# Setup the following configurations only in test deployments where Kafka is 
started within Atlas in embedded mode
+# atlas.notification.embedded=true
+# atlas.kafka.data=${sys:atlas.home}/data/kafka
 
-<verbatim>
-# Whether to save messages that failed to be sent to Kafka, default is true
-atlas.notification.log.failed.messages=true
-# If saving messages is enabled, the file name to save them to. This file will 
be created under the log directory of the hook's host component - like 
HiveServer2
-atlas.notification.failed.messages.filename=atlas_hook_failed_messages.log
+# Setup the following two properties if Kafka is running in Kerberized mode.
+# atlas.notification.kafka.service.principal=kafka/[email protected]
+# 
atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
 </verbatim>
 
 ---++ Client Configs
 <verbatim>
 atlas.client.readTimeoutMSecs=60000
 atlas.client.connectTimeoutMSecs=60000
-atlas.rest.address=<http/https>://<atlas-fqdn>:<atlas port> - default 
http://localhost:21000
+
+# URL to access Atlas server. For example: http://localhost:21000
+atlas.rest.address=
 </verbatim>
 
 
@@ -212,26 +98,28 @@ atlas.enableTLS=false
 </verbatim>
 
 ---++ High Availability Properties
-
 The following properties describe High Availability related configuration 
options:
 
 <verbatim>
 # Set the following property to true, to enable High Availability. Default = 
false.
 atlas.server.ha.enabled=true
 
-# Define a unique set of strings to identify each instance that should run an 
Atlas Web Service instance as a comma separated list.
+# Specify the list of Atlas instances
 atlas.server.ids=id1,id2
-# For each string defined above, define the host and port on which Atlas 
server binds to.
+# For each instance defined above, define the host and port on which Atlas 
server listens.
 atlas.server.address.id1=host1.company.com:21000
 atlas.server.address.id2=host2.company.com:31000
 
 # Specify Zookeeper properties needed for HA.
 # Specify the list of services running Zookeeper servers as a comma separated 
list.
 
atlas.server.ha.zookeeper.connect=zk1.company.com:2181,zk2.company.com:2181,zk3.company.com:2181
+
 # Specify how many times should connection try to be established with a 
Zookeeper cluster, in case of any connection issues.
 atlas.server.ha.zookeeper.num.retries=3
+
 # Specify how much time should the server wait before attempting connections 
to Zookeeper, in case of any connection issues.
 atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
+
 # Specify how long a session to Zookeeper should last without inactiviy to be 
deemed as unreachable.
 atlas.server.ha.zookeeper.session.timeout.ms=20000
 
@@ -239,6 +127,7 @@ atlas.server.ha.zookeeper.session.timeout.ms=20000
 # The format of these options is <scheme>:<identity>. For more information 
refer to 
http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl.
 # The 'acl' option allows to specify a scheme, identity pair to setup an ACL 
for.
 atlas.server.ha.zookeeper.acl=sasl:[email protected]
+
 # The 'auth' option specifies the authentication that should be used for 
connecting to Zookeeper.
 atlas.server.ha.zookeeper.auth=sasl:[email protected]
 
@@ -254,14 +143,12 @@ atlas.client.ha.sleep.interval.ms=5000
 </verbatim>
 
 ---++ Server Properties
-
 <verbatim>
 # Set the following property to true, to enable the setup steps to run on each 
server start. Default = false.
 atlas.server.run.setup.on.start=false
 </verbatim>
 
 ---++ Performance configuration items
-
 The following properties can be used to tune performance of Atlas under 
specific circumstances:
 
 <verbatim>
@@ -288,14 +175,19 @@ atlas.webserver.queuesize=100
 </verbatim>
 
 ---+++ Recording performance metrics
-
-Atlas package should be built with '-P perf' to instrument atlas code to 
collect metrics. The metrics will be recorded in
-<atlas.log.dir>/metric.log, with one log line per API call. The metrics 
contain the number of times the instrumented methods
-are called and the total time spent in the instrumented method. Logging to 
metric.log is controlled through log4j configuration
-in atlas-log4j.xml. When the atlas code is instrumented, to disable logging to 
metric.log at runtime, set log level of METRICS logger to info level:
-<verbatim>
-<logger name="METRICS" additivity="false">
-    <level value="info"/>
-    <appender-ref ref="METRICS"/>
-</logger>
-</verbatim>
+To enable performance logs for various Atlas operations (like REST API calls, 
notification processing), setup the following in atlas-log4j.xml:
+<verbatim>
+  <appender name="perf_appender" 
class="org.apache.log4j.DailyRollingFileAppender">
+    <param name="File" value="/var/log/atlas/atlas_perf.log"/>
+    <param name="datePattern" value="'.'yyyy-MM-dd"/>
+    <param name="append" value="true"/>
+    <layout class="org.apache.log4j.PatternLayout">
+      <param name="ConversionPattern" value="%d|%t|%m%n"/>
+    </layout>
+  </appender>
+
+   <logger name="org.apache.atlas.perf" additivity="false">
+     <level value="debug"/>
+     <appender-ref ref="perf_appender"/>
+   </logger>
+ </verbatim>

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/HighAvailability.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/HighAvailability.twiki 
b/docs/src/site/twiki/HighAvailability.twiki
index 1e52c85..4270d09 100644
--- a/docs/src/site/twiki/HighAvailability.twiki
+++ b/docs/src/site/twiki/HighAvailability.twiki
@@ -157,9 +157,9 @@ At a high level the following points can be called out:
 
 ---++ Metadata Store
 
-As described above, Atlas uses Titan to store the metadata it manages. By 
default, Atlas uses a standalone HBase
-instance as the backing store for Titan. In order to provide HA for the 
metadata store, we recommend that Atlas be
-configured to use distributed HBase as the backing store for Titan.  Doing 
this implies that you could benefit from the
+As described above, Atlas uses JanusGraph to store the metadata it manages. By 
default, Atlas uses a standalone HBase
+instance as the backing store for JanusGraph. In order to provide HA for the 
metadata store, we recommend that Atlas be
+configured to use distributed HBase as the backing store for JanusGraph.  
Doing this implies that you could benefit from the
 HA guarantees HBase provides. In order to configure Atlas to use HBase in HA 
mode, do the following:
 
    * Choose an existing HBase cluster that is set up in HA mode to configure 
in Atlas (OR) Set up a new HBase cluster in 
[[http://hbase.apache.org/book.html#quickstart_fully_distributed][HA mode]].
@@ -169,8 +169,8 @@ HA guarantees HBase provides. In order to configure Atlas 
to use HBase in HA mod
 
 ---++ Index Store
 
-As described above, Atlas indexes metadata through Titan to support full text 
search queries. In order to provide HA
-for the index store, we recommend that Atlas be configured to use Solr as the 
backing index store for Titan. In order
+As described above, Atlas indexes metadata through JanusGraph to support full 
text search queries. In order to provide HA
+for the index store, we recommend that Atlas be configured to use Solr as the 
backing index store for JanusGraph. In order
 to configure Atlas to use Solr in HA mode, do the following:
 
    * Choose an existing !SolrCloud cluster setup in HA mode to configure in 
Atlas (OR) Set up a new 
[[https://cwiki.apache.org/confluence/display/solr/SolrCloud][SolrCloud 
cluster]].
@@ -208,4 +208,4 @@ to configure Atlas to use Kafka in HA mode, do the 
following:
 
 ---++ Known Issues
 
-   * If the HBase region servers hosting the Atlas ‘titan’ HTable are 
down, Atlas would not be able to store or retrieve metadata from HBase until 
they are brought back online.
\ No newline at end of file
+   * If the HBase region servers hosting the Atlas table are down, Atlas would 
not be able to store or retrieve metadata from HBase until they are brought 
back online.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/InstallationSteps.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/InstallationSteps.twiki 
b/docs/src/site/twiki/InstallationSteps.twiki
index c59f495..6b9f031 100644
--- a/docs/src/site/twiki/InstallationSteps.twiki
+++ b/docs/src/site/twiki/InstallationSteps.twiki
@@ -1,135 +1,51 @@
 ---++ Building & Installing Apache Atlas
 
 ---+++ Building Atlas
-
 <verbatim>
 git clone https://git-wip-us.apache.org/repos/asf/atlas.git atlas
-
 cd atlas
+export MAVEN_OPTS="-Xms2g -Xmx2g"
+mvn clean -DskipTests install</verbatim>
 
-export MAVEN_OPTS="-Xmx1536m" && mvn clean install
-</verbatim>
-
-Once the build successfully completes, artifacts can be packaged for 
deployment.
-
-<verbatim>
-
-mvn clean package -Pdist
-
-</verbatim>
-
-NOTE:
-1. Use option '-DskipTests' to skip running unit and integration tests
-2. Use option '-P perf' to instrument atlas to collect performance metrics
 
-To build a distribution that configures Atlas for external HBase and Solr, 
build with the external-hbase-solr profile.
+---+++ Packaging Atlas
+To create Apache Atlas package for deployment in an environment having 
functional HBase and Solr instances, build with the following command:
 
 <verbatim>
+mvn clean -DskipTests package -Pdist</verbatim>
 
-mvn clean package -Pdist,external-hbase-solr
+   * NOTES:
+      * Remove option '-DskipTests' to run unit and integration tests
+      * To build a distribution without minified js,css file, build with 
skipMinify profile. By default js and css files are minified.
 
-</verbatim>
 
-Note that when the external-hbase-solr profile is used the following steps 
need to be completed to make Atlas functional.
+Above will build Atlas for an environment having functional HBase and Solr 
instances. Atlas needs to be setup with the following to run in this 
environment:
    * Configure atlas.graph.storage.hostname (see "Graph persistence engine - 
HBase" in the [[Configuration][Configuration]] section).
    * Configure atlas.graph.index.search.solr.zookeeper-url (see "Graph Search 
Index - Solr" in the [[Configuration][Configuration]] section).
    * Set HBASE_CONF_DIR to point to a valid HBase config directory (see "Graph 
persistence engine - HBase" in the [[Configuration][Configuration]] section).
    * Create the SOLR indices (see "Graph Search Index - Solr" in the 
[[Configuration][Configuration]] section).
 
-To build a distribution that packages HBase and Solr, build with the 
embedded-hbase-solr profile.
-
-<verbatim>
-
-mvn clean package -Pdist,embedded-hbase-solr
-
-</verbatim>
-
-Using the embedded-hbase-solr profile will configure Atlas so that an HBase 
instance and a Solr instance will be started
-and stopped along with the Atlas server by default.
 
-Atlas also supports building a distribution that can use BerkeleyDB and 
Elastic search as the graph and index backends.
-To build a distribution that is configured for these backends, build with the 
berkeley-elasticsearch profile.
+---+++ Packaging Atlas with Embedded HBase & Solr
+To create Apache Atlas package that includes HBase and Solr, build with the 
embedded-hbase-solr profile as shown below:
 
 <verbatim>
+mvn clean -DskipTests package -Pdist,embedded-hbase-solr</verbatim>
 
-mvn clean package -Pdist,berkeley-elasticsearch
-
-</verbatim>
-
-An additional step is required for the binary built using this profile to be 
used along with the Atlas distribution.
-Due to licensing requirements, Atlas does not bundle the BerkeleyDB Java 
Edition in the tarball.
+Using the embedded-hbase-solr profile will configure Atlas so that an HBase 
instance and a Solr instance will be started and stopped along with the Atlas 
server by default.
 
-You can download the Berkeley DB jar file from the URL: 
<verbatim>http://download.oracle.com/otn/berkeley-db/je-5.0.73.zip</verbatim>
-and copy the je-5.0.73.jar to the ${atlas_home}/libext directory.
 
-Tar can be found in 
atlas/distro/target/apache-atlas-${project.version}-bin.tar.gz
-
-Tar is structured as follows
+---+++ Apache Atlas Package
+Build will create following files, which are used to install Apache Atlas.
 
 <verbatim>
-
-|- bin
-   |- atlas_start.py
-   |- atlas_stop.py
-   |- atlas_config.py
-   |- quick_start.py
-   |- cputil.py
-|- conf
-   |- atlas-application.properties
-   |- atlas-env.sh
-   |- hbase
-      |- hbase-site.xml.template
-   |- log4j.xml
-   |- solr
-      |- currency.xml
-      |- lang
-         |- stopwords_en.txt
-      |- protowords.txt
-      |- schema.xml
-      |- solrconfig.xml
-      |- stopwords.txt
-      |- synonyms.txt
-|- docs
-|- hbase
-   |- bin
-   |- conf
-   ...
-|- server
-   |- webapp
-      |- atlas.war
-|- solr
-   |- bin
-   ...
-|- README
-|- NOTICE
-|- LICENSE
-|- DISCLAIMER.txt
-|- CHANGES.txt
-
-</verbatim>
-
-Note that if the embedded-hbase-solr profile is specified for the build then 
HBase and Solr are included in the
-distribution.
-
-In this case, a standalone instance of HBase can be started as the default 
storage backend for the graph repository.
-During Atlas installation the conf/hbase/hbase-site.xml.template gets expanded 
and moved to hbase/conf/hbase-site.xml
-for the initial standalone HBase configuration.  To configure ATLAS
-graph persistence for a different HBase instance, please see "Graph 
persistence engine - HBase" in the
-[[Configuration][Configuration]] section.
-
-Also, a standalone instance of Solr can be started as the default search 
indexing backend.  To configure ATLAS search
-indexing for a different Solr instance please see "Graph Search Index - Solr" 
in the
-[[Configuration][Configuration]] section.
-
-To build a distribution without minified js,css file, build with the 
skipMinify profile.
-
-<verbatim>
-
-mvn clean package -Pdist,skipMinify
-
-</verbatim>
-
-Note that by default js and css files are minified.
+distro/target/apache-atlas-${project.version}-bin.tar.gz
+distro/target/apache-atlas-${project.version}-hive-hook.gz
+distro/target/apache-atlas-${project.version}-hbase-hook.tar.gz
+distro/target/apache-atlas-${project.version}-sqoop-hook.tar.gz
+distro/target/apache-atlas-${project.version}-storm-hook.tar.gz
+distro/target/apache-atlas-${project.version}-falcon-hook.tar.gz
+distro/target/apache-atlas-${project.version}-sources.tar.gz</verbatim>
 
 ---+++ Installing & Running Atlas
 
@@ -137,18 +53,12 @@ Note that by default js and css files are minified.
 <verbatim>
 tar -xzvf apache-atlas-${project.version}-bin.tar.gz
 
-cd atlas-${project.version}
-</verbatim>
+cd atlas-${project.version}</verbatim>
 
 ---++++ Configuring Atlas
+By default config directory used by Atlas is {package dir}/conf. To override 
this set environment variable ATLAS_CONF to the path of the conf dir.
 
-By default config directory used by Atlas is {package dir}/conf. To override 
this set environment
-variable ATLAS_CONF to the path of the conf dir.
-
-atlas-env.sh has been added to the Atlas conf. This file can be used to set 
various environment
-variables that you need for you services. In addition you can set any other 
environment
-variables you might need. This file will be sourced by atlas scripts before 
any commands are
-executed. The following environment variables are available to set.
+Environment variables needed to run Atlas can be set in  atlas-env.sh file in 
the conf directory. This file will be sourced by Atlas scripts before any 
commands are executed. The following environment variables are available to set.
 
 <verbatim>
 # The java implementation to use. If JAVA_HOME is not found we expect java and 
jar to be in path
@@ -169,7 +79,7 @@ executed. The following environment variables are available 
to set.
 # java heap size we want to set for the atlas server. Default is 1024MB
 #export ATLAS_SERVER_HEAP=
 
-# What is is considered as atlas home dir. Default is the base locaion of the 
installed software
+# What is is considered as atlas home dir. Default is the base location of the 
installed software
 #export ATLAS_HOME_DIR=
 
 # Where log files are stored. Defatult is logs directory under the base 
install location
@@ -178,66 +88,48 @@ executed. The following environment variables are 
available to set.
 # Where pid files are stored. Defatult is logs directory under the base 
install location
 #export ATLAS_PID_DIR=
 
-# where the atlas titan db data is stored. Defatult is logs/data directory 
under the base install location
-#export ATLAS_DATA_DIR=
-
 # Where do you want to expand the war file. By Default it is in /server/webapp 
dir under the base install dir.
-#export ATLAS_EXPANDED_WEBAPP_DIR=
-</verbatim>
+#export ATLAS_EXPANDED_WEBAPP_DIR=</verbatim>
 
 *Settings to support large number of metadata objects*
 
-If you plan to store several tens of thousands of metadata objects, it is 
recommended that you use values
-tuned for better GC performance of the JVM.
+If you plan to store large number of metadata objects, it is recommended that 
you use values tuned for better GC performance of the JVM.
 
 The following values are common server side options:
 <verbatim>
-export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 
-XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution 
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof 
-Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails 
-XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps"
-</verbatim>
+export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 
-XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution 
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof 
-Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails 
-XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps"</verbatim>
 
-The =-XX:SoftRefLRUPolicyMSPerMB= option was found to be particularly helpful 
to regulate GC performance for
-query heavy workloads with many concurrent users.
+The =-XX:SoftRefLRUPolicyMSPerMB= option was found to be particularly helpful 
to regulate GC performance for query heavy workloads with many concurrent users.
 
 The following values are recommended for JDK 8:
 <verbatim>
-export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m 
-XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m"
-</verbatim>
+export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m 
-XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m"</verbatim>
 
 *NOTE for Mac OS users*
 If you are using a Mac OS, you will need to configure the ATLAS_SERVER_OPTS 
(explained above).
 
 In  {package dir}/conf/atlas-env.sh uncomment the following line
 <verbatim>
-#export ATLAS_SERVER_OPTS=
-</verbatim>
+#export ATLAS_SERVER_OPTS=</verbatim>
 
 and change it to look as below
 <verbatim>
-export ATLAS_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= 
-Djava.security.krb5.kdc="
-</verbatim>
+export ATLAS_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= 
-Djava.security.krb5.kdc="</verbatim>
 
-*Hbase as the Storage Backend for the Graph Repository*
+*HBase as the Storage Backend for the Graph Repository*
 
-By default, Atlas uses Titan as the graph repository and is the only graph 
repository implementation available currently.
-The HBase versions currently supported are 1.1.x. For configuring ATLAS graph 
persistence on HBase, please see "Graph persistence engine - HBase" in the 
[[Configuration][Configuration]] section
-for more details.
+By default, Atlas uses JanusGraph as the graph repository and is the only 
graph repository implementation available currently. The HBase versions 
currently supported are 1.1.x. For configuring ATLAS graph persistence on 
HBase, please see "Graph persistence engine - HBase" in the 
[[Configuration][Configuration]] section for more details.
 
-Pre-requisites for running HBase as a distributed cluster
-   * 3 or 5 !ZooKeeper nodes
-   * Atleast 3 !RegionServer nodes. It would be ideal to run the !DataNodes on 
the same hosts as the Region servers for data locality.
-
-HBase tablename in Titan can be set using the following configuration in 
ATLAS_HOME/conf/atlas-application.properties:
+HBase tables used by Atlas can be set using the following configurations:
 <verbatim>
-atlas.graph.storage.hbase.table=apache_atlas_titan
-atlas.audit.hbase.tablename=apache_atlas_entity_audit
-</verbatim>
+atlas.graph.storage.hbase.table=atlas
+atlas.audit.hbase.tablename=apache_atlas_entity_audit</verbatim>
 
 *Configuring SOLR as the Indexing Backend for the Graph Repository*
 
-By default, Atlas uses Titan as the graph repository and is the only graph 
repository implementation available currently.
-For configuring Titan to work with Solr, please follow the instructions below
+By default, Atlas uses JanusGraph as the graph repository and is the only 
graph repository implementation available currently. For configuring JanusGraph 
to work with Solr, please follow the instructions below
 
-   * Install solr if not already running. The version of SOLR supported is 
5.2.1. Could be installed from 
http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz
+   * Install solr if not already running. The version of SOLR supported is 
5.5.1. Could be installed from 
http://archive.apache.org/dist/lucene/solr/5.5.1/solr-5.5.1.tgz
 
    * Start solr in cloud mode.
   !SolrCloud mode uses a !ZooKeeper Service as a highly available, central 
location for cluster management.
@@ -249,15 +141,12 @@ For configuring Titan to work with Solr, please follow 
the instructions below
       $SOLR_HOME/bin/solr start -c -z <zookeeper_host:port> -p 8983
       </verbatim>
 
-   * Run the following commands from SOLR_BIN (e.g. $SOLR_HOME/bin) directory 
to create collections in Solr corresponding to the indexes that Atlas uses. In 
the case that the ATLAS and SOLR instance are on 2 different hosts,
-  first copy the required configuration files from ATLAS_HOME/conf/solr on the 
ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned 
commands refer to the directory where the solr configuration files
-  have been copied to on Solr host:
+   * Run the following commands from SOLR_BIN (e.g. $SOLR_HOME/bin) directory 
to create collections in Solr corresponding to the indexes that Atlas uses. In 
the case that the ATLAS and SOLR instance are on 2 different hosts, first copy 
the required configuration files from ATLAS_HOME/conf/solr on the ATLAS 
instance host to the Solr instance host. SOLR_CONF in the below mentioned 
commands refer to the directory where the solr configuration files have been 
copied to on Solr host:
 
 <verbatim>
   $SOLR_BIN/solr create -c vertex_index -d SOLR_CONF -shards #numShards 
-replicationFactor #replicationFactor
   $SOLR_BIN/solr create -c edge_index -d SOLR_CONF -shards #numShards 
-replicationFactor #replicationFactor
-  $SOLR_BIN/solr create -c fulltext_index -d SOLR_CONF -shards #numShards 
-replicationFactor #replicationFactor
-</verbatim>
+  $SOLR_BIN/solr create -c fulltext_index -d SOLR_CONF -shards #numShards 
-replicationFactor #replicationFactor</verbatim>
 
   Note: If numShards and replicationFactor are not specified, they default to 
1 which suffices if you are trying out solr with ATLAS on a single node 
instance.
   Otherwise specify numShards according to the number of hosts that are in the 
Solr cluster and the maxShardsPerNode configuration.
@@ -274,12 +163,11 @@ For configuring Titan to work with Solr, please follow 
the instructions below
  atlas.graph.index.search.solr.mode=cloud
  atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as 
comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
  atlas.graph.index.search.solr.zookeeper-connect-timeout=<SolrCloud Zookeeper 
Connection Timeout>. Default value is 60000 ms
- atlas.graph.index.search.solr.zookeeper-session-timeout=<SolrCloud Zookeeper 
Session Timeout>. Default value is 60000 ms
-</verbatim>
+ atlas.graph.index.search.solr.zookeeper-session-timeout=<SolrCloud Zookeeper 
Session Timeout>. Default value is 60000 ms</verbatim>
 
    * Restart Atlas
 
-For more information on Titan solr configuration , please refer 
http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm
+For more information on JanusGraph solr configuration , please refer 
http://docs.janusgraph.org/0.2.0/solr.html
 
 Pre-requisites for running Solr in cloud mode
   * Memory - Solr is both memory and CPU intensive. Make sure the server 
running Solr has adequate memory, CPU and disk.
@@ -299,85 +187,124 @@ use configuration in =atlas-application.properties= for 
setting up the topics. P
 for these details.
 
 ---++++ Setting up Atlas
+There are a few steps that setup dependencies of Atlas. One such example is 
setting up the JanusGraph schema in the storage backend of choice. In a simple 
single server setup, these are automatically setup with default configuration 
when the server first accesses these dependencies.
 
-There are a few steps that setup dependencies of Atlas. One such example is 
setting up the Titan schema
-in the storage backend of choice. In a simple single server setup, these are 
automatically setup with default
-configuration when the server first accesses these dependencies.
-
-However, there are scenarios when we may want to run setup steps explicitly as 
one time operations. For example, in a
-multiple server scenario using [[HighAvailability][High Availability]], it is 
preferable to run setup steps from one
-of the server instances the first time, and then start the services.
+However, there are scenarios when we may want to run setup steps explicitly as 
one time operations. For example, in a multiple server scenario using 
[[HighAvailability][High Availability]], it is preferable to run setup steps 
from one of the server instances the first time, and then start the services.
 
 To run these steps one time, execute the command =bin/atlas_start.py -setup= 
from a single Atlas server instance.
 
-However, the Atlas server does take care of parallel executions of the setup 
steps. Also, running the setup steps multiple
-times is idempotent. Therefore, if one chooses to run the setup steps as part 
of server startup, for convenience,
-then they should enable the configuration option 
=atlas.server.run.setup.on.start= by defining it with the value =true=
-in the =atlas-application.properties= file.
+However, the Atlas server does take care of parallel executions of the setup 
steps. Also, running the setup steps multiple times is idempotent. Therefore, 
if one chooses to run the setup steps as part of server startup, for 
convenience, then they should enable the configuration option 
=atlas.server.run.setup.on.start= by defining it with the value =true= in the 
=atlas-application.properties= file.
 
 ---++++ Starting Atlas Server
-
 <verbatim>
-bin/atlas_start.py [-port <port>]
-</verbatim>
-
-By default,
-   * To change the port, use -port option.
-   * atlas server starts with conf from {package dir}/conf. To override this 
(to use the same conf with multiple atlas upgrades), set environment variable 
ATLAS_CONF to the path of conf dir
+bin/atlas_start.py [-port <port>]</verbatim>
 
 ---+++ Using Atlas
-
-   * Quick start model - sample model and data
+   * Verify if the server is up and running
 <verbatim>
-  bin/quick_start.py [<atlas endpoint>]
-</verbatim>
+  curl -v -u username:password http://localhost:21000/api/atlas/admin/version
+  {"Version":"v0.1"}</verbatim>
 
-   * Verify if the server is up and running
+   * Access Atlas UI using a browser: http://localhost:21000
+
+   * Run quick start to load sample model and data
 <verbatim>
-  curl -v http://localhost:21000/api/atlas/admin/version
-  {"Version":"v0.1"}
-</verbatim>
+  bin/quick_start.py [<atlas endpoint>]</verbatim>
 
    * List the types in the repository
 <verbatim>
-  curl -v http://localhost:21000/api/atlas/types
-  
{"results":["Process","Infrastructure","DataSet"],"count":3,"requestId":"1867493731@qtp-262860041-0
 - 82d43a27-7c34-4573-85d1-a01525705091"}
-</verbatim>
+  curl -v -u username:password 
http://localhost:21000/api/atlas/v2/types/typedefs/headers
+  [ 
{"guid":"fa421be8-c21b-4cf8-a226-fdde559ad598","name":"Referenceable","category":"ENTITY"},
+    
{"guid":"7f3f5712-521d-450d-9bb2-ba996b6f2a4e","name":"Asset","category":"ENTITY"},
+    
{"guid":"84b02fa0-e2f4-4cc4-8b24-d2371cd00375","name":"DataSet","category":"ENTITY"},
+    
{"guid":"f93975d5-5a5c-41da-ad9d-eb7c4f91a093","name":"Process","category":"ENTITY"},
+    
{"guid":"79dcd1f9-f350-4f7b-b706-5bab416f8206","name":"Infrastructure","category":"ENTITY"}
+  ]</verbatim>
 
    * List the instances for a given type
 <verbatim>
-  curl -v http://localhost:21000/api/atlas/entities?type=hive_table
-  
{"requestId":"788558007@qtp-44808654-5","list":["cb9b5513-c672-42cb-8477-b8f3e537a162","ec985719-a794-4c98-b98f-0509bd23aac0","48998f81-f1d3-45a2-989a-223af5c1ed6e","a54b386e-c759-4651-8779-a099294244c4"]}
-
-  curl -v http://localhost:21000/api/atlas/entities/list/hive_db
-</verbatim>
-
-   * Search for entities (instances) in the repository
+  curl -v -u username:password 
http://localhost:21000/api/atlas/v2/search/basic?typeName=hive_db
+  {
+    "queryType":"BASIC",
+    "searchParameters":{
+      "typeName":"hive_db",
+      "excludeDeletedEntities":false,
+      "includeClassificationAttributes":false,
+      "includeSubTypes":true,
+      "includeSubClassifications":true,
+      "limit":100,
+      "offset":0
+    },
+    "entities":[
+      {
+        "typeName":"hive_db",
+        "guid":"5d900c19-094d-4681-8a86-4eb1d6ffbe89",
+        "status":"ACTIVE",
+        "displayText":"default",
+        "classificationNames":[],
+        "attributes":{
+          "owner":"public",
+          "createTime":null,
+          "qualifiedName":"default@cl1",
+          "name":"default",
+          "description":"Default Hive database"
+        }
+      },
+      {
+        "typeName":"hive_db",
+        "guid":"3a0b14b0-ab85-4b65-89f2-e418f3f7f77c",
+        "status":"ACTIVE",
+        "displayText":"finance",
+        "classificationNames":[],
+        "attributes":{
+          "owner":"hive",
+          "createTime":null,
+          "qualifiedName":"finance@cl1",
+          "name":"finance",
+          "description":null
+        }
+      }
+    ]
+  }</verbatim>
+
+   * Search for entities
 <verbatim>
-  curl -v http://localhost:21000/api/atlas/discovery/search/dsl?query="from 
hive_table"
-</verbatim>
-
-
-*Dashboard*
-
-Once atlas is started, you can view the status of atlas entities using the 
Web-based dashboard. You can open your browser at the corresponding port to use 
the web UI.
+  curl -v -u username:password 
http://localhost:21000/api/atlas/v2/search/dsl?query=hive_db%20where%20name='default'
+    {
+      "queryType":"DSL",
+      "queryText":"hive_db where name='default'",
+      "entities":[
+        {
+          "typeName":"hive_db",
+          "guid":"5d900c19-094d-4681-8a86-4eb1d6ffbe89",
+          "status":"ACTIVE",
+          "displayText":"default",
+          "classificationNames":[],
+          "attributes":{
+            "owner":"public",
+            "createTime":null,
+            "qualifiedName":"default@cl1",
+            "name":"default",
+            "description":
+            "Default Hive database"
+          }
+        }
+      ]
+    }</verbatim>
 
 
 ---+++ Stopping Atlas Server
-
 <verbatim>
-bin/atlas_stop.py
-</verbatim>
-
----+++ Known Issues
+bin/atlas_stop.py</verbatim>
 
----++++ Setup
+---+++ Troubleshooting
 
+---++++ Setup issues
 If the setup of Atlas service fails due to any reason, the next run of setup 
(either by an explicit invocation of
 =atlas_start.py -setup= or by enabling the configuration option 
=atlas.server.run.setup.on.start=) will fail with
 a message such as =A previous setup run may not have completed cleanly.=. In 
such cases, you would need to manually
 ensure the setup can run and delete the Zookeeper node at 
=/apache_atlas/setup_in_progress= before attempting to
 run setup again.
 
-If the setup failed due to HBase Titan schema setup errors, it may be 
necessary to repair the HBase schema. If no
-data has been stored, one can also disable and drop the 'titan' schema in 
HBase to let setup run again.
+If the setup failed due to HBase JanusGraph schema setup errors, it may be 
necessary to repair the HBase schema. If no
+data has been stored, one can also disable and drop the HBase tables used by 
Atlas and run setup again.

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/QuickStart.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/QuickStart.twiki 
b/docs/src/site/twiki/QuickStart.twiki
index a3c1b1e..dd648d0 100644
--- a/docs/src/site/twiki/QuickStart.twiki
+++ b/docs/src/site/twiki/QuickStart.twiki
@@ -1,9 +1,8 @@
----+ Quick Start Guide
+---+ Quick Start
 
 ---++ Introduction
-This quick start user guide is a simple client that adds a few sample type 
definitions modeled
-after the example as shown below. It also adds example entities along with 
traits as shown in the
-instance graph below.
+Quick start is a simple client that adds a few sample type definitions modeled 
after the example shown below.
+It also adds sample entities along with traits as shown in the instance graph 
below.
 
 
 ---+++ Example Type Definitions

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/Repository.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Repository.twiki 
b/docs/src/site/twiki/Repository.twiki
deleted file mode 100755
index b84b3b3..0000000
--- a/docs/src/site/twiki/Repository.twiki
+++ /dev/null
@@ -1,4 +0,0 @@
----+ Repository
-
----++ Introduction
-

Reply via email to