Repository: atlas
Updated Branches:
  refs/heads/master 39be2ccfd -> c65586f13


http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/TypeSystem.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/TypeSystem.twiki 
b/docs/src/site/twiki/TypeSystem.twiki
index b658cfa..397a1bb 100755
--- a/docs/src/site/twiki/TypeSystem.twiki
+++ b/docs/src/site/twiki/TypeSystem.twiki
@@ -1,7 +1,6 @@
 ---+ Type System
 
 ---++ Overview
-
 Atlas allows users to define a model for the metadata objects they want to 
manage. The model is composed of definitions
 called ‘types’. Instances of ‘types’ called ‘entities’ represent 
the actual metadata objects that are managed. The Type
 System is a component that allows users to define and manage the types and 
entities. All metadata objects managed by
@@ -9,7 +8,6 @@ Atlas out of the box (like Hive tables, for e.g.) are modelled 
using types and r
 types of metadata in Atlas, one needs to understand the concepts of the type 
system component.
 
 ---++ Types
-
 A ‘Type’ in Atlas is a definition of how a particular type of metadata 
objects are stored and accessed. A type
 represents one or a collection of attributes that define the properties for 
the metadata object. Users with a
 development background will recognize the similarity of a type to a 
‘Class’ definition of object oriented programming
@@ -19,108 +17,100 @@ An example of a type that comes natively defined with 
Atlas is a Hive table. A H
 attributes:
 
 <verbatim>
-Name: hive_table
-MetaType: Class
-SuperTypes: DataSet
+Name:         hive_table
+TypeCategory: Entity
+SuperTypes:   DataSet
 Attributes:
-    name: String (name of the table)
-    db: Database object of type hive_db
-    owner: String
-    createTime: Date
-    lastAccessTime: Date
-    comment: String
-    retention: int
-    sd: Storage Description object of type hive_storagedesc
-    partitionKeys: Array of objects of type hive_column
-    aliases: Array of strings
-    columns: Array of objects of type hive_column
-    parameters: Map of String keys to String values
-    viewOriginalText: String
-    viewExpandedText: String
-    tableType: String
-    temporary: Boolean
-</verbatim>
+    name:             string
+    db:               hive_db
+    owner:            string
+    createTime:       date
+    lastAccessTime:   date
+    comment:          string
+    retention:        int
+    sd:               hive_storagedesc
+    partitionKeys:    array<hive_column>
+    aliases:          array<string>
+    columns:          array<hive_column>
+    parameters:       map<string,string>
+    viewOriginalText: string
+    viewExpandedText: string
+    tableType:        string
+    temporary:        boolean</verbatim>
 
 The following points can be noted from the above example:
 
    * A type in Atlas is identified uniquely by a ‘name’
-   * A type has a metatype. A metatype represents the type of this model in 
Atlas. Atlas has the following metatypes:
-      * Basic metatypes: E.g. Int, String, Boolean etc.
+   * A type has a metatype. Atlas has the following metatypes:
+      * Primitive metatypes: boolean, byte, short, int, long, float, double, 
biginteger, bigdecimal, string, date
       * Enum metatypes
-      * Collection metatypes: E.g. Array, Map
-      * Composite metatypes: E.g. Class, Struct, Trait
-   * A type can ‘extend’ from a parent type called ‘supertype’ - by 
virtue of this, it will get to include the attributes that are defined in the 
supertype as well. This allows modellers to define common attributes across a 
set of related types etc. This is again similar to the concept of how Object 
Oriented languages define super classes for a class. It is also possible for a 
type in Atlas to extend from multiple super types.
+      * Collection metatypes: array, map
+      * Composite metatypes: Entity, Struct, Classification, Relationship
+   * Entity & Classification types can ‘extend’ from other types, called 
‘supertype’ - by virtue of this, it will get to include the attributes that 
are defined in the supertype as well. This allows modellers to define common 
attributes across a set of related types etc. This is again similar to the 
concept of how Object Oriented languages define super classes for a class. It 
is also possible for a type in Atlas to extend from multiple super types.
       * In this example, every hive table extends from a pre-defined supertype 
called a ‘DataSet’. More details about this pre-defined types will be 
provided later.
-   * Types which have a metatype of ‘Class’, ‘Struct’ or ‘Trait’ 
can have a collection of attributes. Each attribute has a name (e.g.  
‘name’) and some other associated properties. A property can be referred to 
using an expression type_name.attribute_name. It is also good to note that 
attributes themselves are defined using Atlas metatypes.
+   * Types which have a metatype of ‘Entity’, ‘Struct’, 
‘Classification’ or 'Relationship' can have a collection of attributes. 
Each attribute has a name (e.g.  ‘name’) and some other associated 
properties. A property can be referred to using an expression 
type_name.attribute_name. It is also good to note that attributes themselves 
are defined using Atlas metatypes.
       * In this example, hive_table.name is a String, hive_table.aliases is an 
array of Strings, hive_table.db refers to an instance of a type called hive_db 
and so on.
-   * Type references in attributes, (like hive_table.db) are particularly 
interesting. Note that using such an attribute, we can define arbitrary 
relationships between two types defined in Atlas and thus build rich models. 
Note that one can also collect a list of references as an attribute type (e.g. 
hive_table.cols which represents a list of references from hive_table to the 
hive_column type)
+   * Type references in attributes, (like hive_table.db) are particularly 
interesting. Note that using such an attribute, we can define arbitrary 
relationships between two types defined in Atlas and thus build rich models. 
Note that one can also collect a list of references as an attribute type (e.g. 
hive_table.columns which represents a list of references from hive_table to 
hive_column type)
 
 ---++ Entities
-
-An ‘entity’ in Atlas is a specific value or instance of a Class ‘type’ 
and thus represents a specific metadata object
+An ‘entity’ in Atlas is a specific value or instance of an Entity 
‘type’ and thus represents a specific metadata object
 in the real world. Referring back to our analogy of Object Oriented 
Programming languages, an ‘instance’ is an
 ‘Object’ of a certain ‘Class’.
 
 An example of an entity will be a specific Hive Table. Say Hive has a table 
called ‘customers’ in the ‘default’
-database. This table will be an ‘entity’ in Atlas of type hive_table. By 
virtue of being an instance of a class
+database. This table will be an ‘entity’ in Atlas of type hive_table. By 
virtue of being an instance of an entity
 type, it will have values for every attribute that are a part of the Hive 
table ‘type’, such as:
 
 <verbatim>
-id: "9ba387dd-fa76-429c-b791-ffc338d3c91f"
-typeName: “hive_table”
+guid:     "9ba387dd-fa76-429c-b791-ffc338d3c91f"
+typeName: "hive_table"
+status:   "ACTIVE"
 values:
-    name: “customers”
-    db: "b42c6cfc-c1e7-42fd-a9e6-890e0adf33bc"
-    owner: “admin”
-    createTime: "2016-06-20T06:13:28.000Z"
-    lastAccessTime: "2016-06-20T06:13:28.000Z"
-    comment: null
-    retention: 0
-    sd: "ff58025f-6854-4195-9f75-3a3058dd8dcf"
-    partitionKeys: null
-    aliases: null
-    columns: ["65e2204f-6a23-4130-934a-9679af6a211f", 
"d726de70-faca-46fb-9c99-cf04f6b579a6", ...]
-    parameters: {"transient_lastDdlTime": "1466403208"}
+    name:             “customers”
+    db:               { "guid": "b42c6cfc-c1e7-42fd-a9e6-890e0adf33bc", 
"typeName": "hive_db" }
+    owner:            “admin”
+    createTime:       1490761686029
+    updateTime:       1516298102877
+    comment:          null
+    retention:        0
+    sd:               { "guid": "ff58025f-6854-4195-9f75-3a3058dd8dcf", 
"typeName": "hive_storagedesc" }
+    partitionKeys:    null
+    aliases:          null
+    columns:          [ { "guid": ""65e2204f-6a23-4130-934a-9679af6a211f", 
"typeName": "hive_column" }, { "guid": ""d726de70-faca-46fb-9c99-cf04f6b579a6", 
"typeName": "hive_column" }, ...]
+    parameters:       { "transient_lastDdlTime": "1466403208"}
     viewOriginalText: null
     viewExpandedText: null
-    tableType: “MANAGED_TABLE”
-    temporary: false
-</verbatim>
+    tableType:        “MANAGED_TABLE”
+    temporary:        false</verbatim>
 
 The following points can be noted from the example above:
 
-   * Every entity that is an instance of a Class type is identified by a 
unique identifier, a GUID. This GUID is generated by the Atlas server when the 
object is defined, and remains constant for the entire lifetime of the entity. 
At any point in time, this particular entity can be accessed using its GUID.
+   * Every instance ofan entity type is identified by a unique identifier, a 
GUID. This GUID is generated by the Atlas server when the object is defined, 
and remains constant for the entire lifetime of the entity. At any point in 
time, this particular entity can be accessed using its GUID.
       * In this example, the ‘customers’ table in the default database is 
uniquely identified by the GUID "9ba387dd-fa76-429c-b791-ffc338d3c91f"
    * An entity is of a given type, and the name of the type is provided with 
the entity definition.
       * In this example, the ‘customers’ table is a ‘hive_table.
    * The values of this entity are a map of all the attribute names and their 
values for attributes that are defined in the hive_table type definition.
-   * Attribute values will be according to the metatype of the attribute.
-      * Basic metatypes: integer, String, boolean values. E.g. ‘name’ = 
‘customers’, ‘Temporary’ = ‘false’
-      * Collection metatypes: An array or map of values of the contained 
metatype. E.g. parameters = { “transient_lastDdlTime”: “1466403208”}
-      * Composite metatypes: For classes, the value will be an entity with 
which this particular entity will have a relationship. E.g. The hive table 
“customers” is present in a database called “default”. The relationship 
between the table and database are captured via the “db” attribute. Hence, 
the value of the “db” attribute will be a GUID that uniquely identifies the 
hive_db entity called “default”
-
-With this idea on entities, we can now see the difference between Class and 
Struct metatypes. Classes and Structs
-both compose attributes of other types. However, entities of Class types have 
the Id attribute (with a GUID value) a
-nd can be referenced from other entities (like a hive_db entity is referenced 
from a hive_table entity). Instances of
-Struct types do not have an identity of their own. The value of a Struct type 
is a collection of attributes that are
+   * Attribute values will be according to the datatype of the attribute. 
Entity-type attributes will have value of type AtlasObjectId
+
+With this idea on entities, we can now see the difference between Entity and 
Struct metatypes. Entities and Structs
+both compose attributes of other types. However, instances of Entity types 
have an identity (with a GUID value) and can
+be referenced from other entities (like a hive_db entity is referenced from a 
hive_table entity). Instances of Struct
+types do not have an identity of their own. The value of a Struct type is a 
collection of attributes that are
 ‘embedded’ inside the entity itself.
 
 ---++ Attributes
-
-We already saw that attributes are defined inside composite metatypes like 
Class and Struct. But we simplistically
-referred to attributes as having a name and a metatype value. However, 
attributes in Atlas have some more properties
-that define more concepts related to the type system.
+We already saw that attributes are defined inside metatypes like Entity, 
Struct, Classification and Relationship. But we
+implistically referred to attributes as having a name and a metatype value. 
However, attributes in Atlas have some more
+properties that define more concepts related to the type system.
 
 An attribute has the following properties:
 <verbatim>
-    name: string,
-    dataTypeName: string,
-    isComposite: boolean,
+    name:        string,
+    typeName:    string,
+    isOptional:  boolean,
     isIndexable: boolean,
-    isUnique: boolean,
-    multiplicity: enum,
-    reverseAttributeName: string
-</verbatim>
+    isUnique:    boolean,
+    cardinality: enum</verbatim>
 
 The properties above have the following meanings:
 
@@ -132,7 +122,7 @@ The properties above have the following meanings:
    * isIndexable -
       * This flag indicates whether this property should be indexed on, so 
that look ups can be performed using the attribute value as a predicate and can 
be performed efficiently.
    * isUnique -
-      * This flag is again related to indexing. If specified to be unique, it 
means that a special index is created for this attribute in Titan that allows 
for equality based look ups.
+      * This flag is again related to indexing. If specified to be unique, it 
means that a special index is created for this attribute in JanusGraph that 
allows for equality based look ups.
       * Any attribute with a true value for this flag is treated like a 
primary key to distinguish this entity from other entities. Hence care should 
be taken ensure that this attribute does model a unique property in real world.
          * For e.g. consider the name attribute of a hive_table. In isolation, 
a name is not a unique attribute for a hive_table, because tables with the same 
name can exist in multiple databases. Even a pair of (database name, table 
name) is not unique if Atlas is storing metadata of hive tables amongst 
multiple clusters. Only a cluster location, database name and table name can be 
deemed unique in the physical world.
    * multiplicity - indicates whether this attribute is required, optional, or 
could be multi-valued. If an entity’s definition of the attribute value does 
not match the multiplicity declaration in the type definition, this would be a 
constraint violation and the entity addition will fail. This field can 
therefore be used to define some constraints on the metadata information.
@@ -142,59 +132,55 @@ Let us look at the attribute called ‘db’ which 
represents the database to wh
 
 <verbatim>
 db:
-    "dataTypeName": "hive_db",
-    "isComposite": false,
+    "name":        "db",
+    "typeName":    "hive_db",
+    "isOptional":  false,
     "isIndexable": true,
-    "isUnique": false,
-    "multiplicity": "required",
-    "name": "db",
-    "reverseAttributeName": null
-</verbatim>
+    "isUnique":    false,
+    "cardinality": "SINGLE"</verbatim>
 
-Note the “required” constraint on multiplicity. A table entity cannot be 
sent without a db reference.
+Note the “isOptional=true” constraint - a table entity cannot be created 
without a db reference.
 
 <verbatim>
 columns:
-    "dataTypeName": "array<hive_column>",
-    "isComposite": true,
+    "name":        "columns",
+    "typeName":    "array<hive_column>",
+    "isOptional":  optional,
     "isIndexable": true,
-    “isUnique": false,
-    "multiplicity": "optional",
-    "name": "columns",
-    "reverseAttributeName": null
-</verbatim>
+    “isUnique":    false,
+    "constraints": [ { "type": "ownedRef" } ]</verbatim>
 
-Note the “isComposite” true value for columns. By doing this, we are 
indicating that the defined column entities should
+Note the “ownedRef” constraint for columns. By doing this, we are 
indicating that the defined column entities should
 always be bound to the table entity they are defined with.
 
 From this description and examples, you will be able to realize that attribute 
definitions can be used to influence
 specific modelling behavior (constraints, indexing, etc) to be enforced by the 
Atlas system.
 
 ---++ System specific types and their significance
-
-Atlas comes with a few pre-defined system types. We saw one example (DataSet) 
in the preceding sections. In this
-section we will see all these types and understand their significance.
+Atlas comes with a few pre-defined system types. We saw one example (DataSet) 
in preceding sections. In this
+section we will see more of these types and understand their significance.
 
 *Referenceable*: This type represents all entities that can be searched for 
using a unique attribute called
 qualifiedName.
 
-*Asset*: This type contains attributes like name, description and owner. Name 
is a required attribute
-(multiplicity = required), the others are optional. The purpose of 
Referenceable and Asset is to provide modellers
-with way to enforce consistency when defining and querying entities of their 
own types. Having these fixed set of
-attributes allows applications and User interfaces to make convention based 
assumptions about what attributes they can
-expect of types by default.
-
-*Infrastructure*: This type extends Referenceable and Asset and typically can 
be used to be a common super type for
-infrastructural metadata objects like clusters, hosts etc.
-
-*!DataSet*: This type extends Referenceable and Asset. Conceptually, it can be 
used to represent an type that stores
-data. In Atlas, hive tables, Sqoop RDBMS tables etc are all types that extend 
from !DataSet. Types that extend !DataSet
-can be expected to have a Schema in the sense that they would have an 
attribute that defines attributes of that dataset.
-For e.g. the columns attribute in a hive_table. Also entities of types that 
extend !DataSet participate in data
-transformation and this transformation can be captured by Atlas via lineage 
(or provenance) graphs.
-
-*Process*: This type extends Referenceable and Asset. Conceptually, it can be 
used to represent any data transformation
-operation. For example, an ETL process that transforms a hive table with raw 
data to another hive table that stores
-some aggregate can be a specific type that extends the Process type. A Process 
type has two specific attributes,
-inputs and outputs. Both  inputs and outputs are arrays of !DataSet entities. 
Thus an instance of a Process type can
-use these inputs and outputs to capture how the lineage of a !DataSet evolves.
\ No newline at end of file
+*Asset*: This type extends Referenceable and adds attributes like name, 
description and owner. Name is a required
+attribute (isOptional=false), the others are optional.
+
+The purpose of Referenceable and Asset is to provide modellers with way to 
enforce consistency when defining and
+querying entities of their own types. Having these fixed set of attributes 
allows applications and user interfaces to
+make convention based assumptions about what attributes they can expect of 
types by default.
+
+*Infrastructure*: This type extends Asset and typically can be used to be a 
common super type for infrastructural
+metadata objects like clusters, hosts etc.
+
+*!DataSet*: This type extends Referenceable. Conceptually, it can be used to 
represent an type that stores data. In Atlas,
+hive tables, hbase_tables etc are all types that extend from !DataSet. Types 
that extend !DataSet can be expected to have
+a Schema in the sense that they would have an attribute that defines 
attributes of that dataset. For e.g. the columns
+attribute in a hive_table. Also entities of types that extend !DataSet 
participate in data transformation and this
+transformation can be captured by Atlas via lineage (or provenance) graphs.
+
+*Process*: This type extends Asset. Conceptually, it can be used to represent 
any data transformation operation. For
+example, an ETL process that transforms a hive table with raw data to another 
hive table that stores some aggregate can
+be a specific type that extends the Process type. A Process type has two 
specific attributes, inputs and outputs. Both
+inputs and outputs are arrays of !DataSet entities. Thus an instance of a 
Process type can use these inputs and outputs
+to capture how the lineage of a !DataSet evolves.

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/index.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/index.twiki b/docs/src/site/twiki/index.twiki
index a8e7de9..198ac52 100755
--- a/docs/src/site/twiki/index.twiki
+++ b/docs/src/site/twiki/index.twiki
@@ -7,39 +7,49 @@ Atlas is a scalable and extensible set of core foundational 
governance services
 enterprises to effectively and efficiently meet their compliance requirements 
within Hadoop and
 allows integration with the whole enterprise data ecosystem.
 
+Apache Atlas provides open metadata management and governance capabilities for 
organizations
+to build a catalog of their data assets, classify and govern these assets and 
provide collaboration
+capabilities around these data assets for data scientists, analysts and the 
data governance team.
+
 ---++ Features
 
----+++ Data Classification
-   * Import or define taxonomy business-oriented annotations for data
-   * Define, annotate, and automate capture of relationships between data sets 
and underlying elements including source, target, and derivation processes
-   * Export metadata to third-party systems
+---+++ Metadata types & instances
+   * Pre-defined types for various Hadoop and non-Hadoop metadata
+   * Ability to define new types for the metadata to be managed
+   * Types can have primitive attributes, complex attributes, object 
references; can inherit from other types
+   * Instances of types, called entities, capture metadata object details and 
their relationships
+   * REST APIs to work with types and instances allow easier integration
+
+---+++ Classification
+   * Ability to dynamically create classifications - like PII, EXPIRES_ON, 
DATA_QUALITY, SENSITIVE
+   * Classifications can include attributes - like expiry_date attribute in 
EXPIRES_ON classification
+   * Entities can be associated with multiple classifications, enabling easier 
discovery and security enforcement
 
----+++ Centralized Auditing
-   * Capture security access information for every application, process, and 
interaction with data
-   * Capture the operational information for execution, steps, and activities
+---+++ Lineage
+   * Intuitive UI to view lineage of data as it moves through various processes
+   * REST APIs to access and update lineage
 
----+++ Search & Lineage (Browse)
-   * Pre-defined navigation paths to explore the data classification and audit 
information
-   * Text-based search features locates relevant data and audit event across 
Data Lake quickly and accurately
-   * Browse visualization of data set lineage allowing users to drill-down 
into operational, security, and provenance related information
+---+++ Search/Discovery
+   * Intuitive UI to search entities by type, classification, attribute value 
or free-text
+   * Rich REST APIs to search by complex criteria
+   * SQL like query language to search entities - Domain Specific Language 
(DSL)
 
----+++ Security & Policy Engine
-   * Rationalize compliance policy at runtime based on data classification 
schemes, attributes and roles.
-   * Advanced definition of policies for preventing data derivation based on 
classification (i.e. re-identification) – Prohibitions
-   * Column and Row level masking based on cell values and attibutes.
+---+++ Security & Data Masking
+   * Integration with Apache Ranger enables authorization/data-masking based 
on classifications associated with entities in Apache Atlas. For example:
+      * who can access data classified as PII, SENSITIVE
+      * customer-service users can only see last 4 digits of columns 
classified as NATIONAL_ID
 
 
 ---++ Getting Started
 
-   * [[InstallationSteps][Install Steps]]
-   * [[QuickStart][Quick Start Guide]]
+   * [[InstallationSteps][Build & Install]]
+   * [[QuickStart][Quick Start]]
 
 
 ---++ Documentation
 
    * [[Architecture][High Level Architecture]]
    * [[TypeSystem][Type System]]
-   * [[Repository][Metadata Repository]]
    * [[Search][Search]]
    * [[security][Security]]
    * [[Authentication-Authorization][Authentication and Authorization]]

http://git-wip-us.apache.org/repos/asf/atlas/blob/c65586f1/docs/src/site/twiki/security.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/security.twiki 
b/docs/src/site/twiki/security.twiki
index 1dcee2a..84b9425 100755
--- a/docs/src/site/twiki/security.twiki
+++ b/docs/src/site/twiki/security.twiki
@@ -43,7 +43,7 @@ The properties for configuring service authentication are:
    * <code>atlas.authentication.keytab</code> - the path to the keytab file.
    * <code>atlas.authentication.principal</code> - the principal to use for 
authenticating to the KDC.  The principal is generally of the form 
"user/host@realm".  You may use the '_HOST' token for the hostname and the 
local hostname will be substituted in by the runtime (e.g. 
"Atlas/[email protected]").
 
-Note that when Atlas is configured with HBase as the storage backend in a 
secure cluster, the graph db (titan) needs sufficient user permissions to be 
able to create and access an HBase table.  To grant the appropriate permissions 
see [[Configuration][Graph persistence engine - Hbase]].
+Note that when Atlas is configured with HBase as the storage backend in a 
secure cluster, the graph db (JanusGraph) needs sufficient user permissions to 
be able to create and access an HBase table.  To grant the appropriate 
permissions see [[Configuration][Graph persistence engine - Hbase]].
 
 ---+++ JAAS configuration
 

Reply via email to