Repository: incubator-atlas Updated Branches: refs/heads/master 39eb9f1ee -> c9ee6d3f7
ATLAS-899 Fix Hive Hook documentation (sumasai via yhemanth) Project: http://git-wip-us.apache.org/repos/asf/incubator-atlas/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-atlas/commit/c9ee6d3f Tree: http://git-wip-us.apache.org/repos/asf/incubator-atlas/tree/c9ee6d3f Diff: http://git-wip-us.apache.org/repos/asf/incubator-atlas/diff/c9ee6d3f Branch: refs/heads/master Commit: c9ee6d3f759437ca14d711d312b79dbf2e89983a Parents: 39eb9f1 Author: Hemanth Yamijala <[email protected]> Authored: Thu Jun 16 19:00:47 2016 +0530 Committer: Hemanth Yamijala <[email protected]> Committed: Thu Jun 16 19:00:47 2016 +0530 ---------------------------------------------------------------------- docs/src/site/twiki/Bridge-Hive.twiki | 33 +++++++++++++++--------------- release-log.txt | 1 + 2 files changed, 18 insertions(+), 16 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/c9ee6d3f/docs/src/site/twiki/Bridge-Hive.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/Bridge-Hive.twiki b/docs/src/site/twiki/Bridge-Hive.twiki index fef2c30..33dd176 100644 --- a/docs/src/site/twiki/Bridge-Hive.twiki +++ b/docs/src/site/twiki/Bridge-Hive.twiki @@ -3,32 +3,26 @@ ---++ Hive Model The default hive modelling is available in org.apache.atlas.hive.model.HiveDataModelGenerator. It defines the following types: <verbatim> -hive_object_type(EnumType) - values [GLOBAL, DATABASE, TABLE, PARTITION, COLUMN] -hive_resource_type(EnumType) - values [JAR, FILE, ARCHIVE] -hive_principal_type(EnumType) - values [USER, ROLE, GROUP] hive_db(ClassType) - super types [Referenceable] - attributes [name, clusterName, description, locationUri, parameters, ownerName, ownerType] +hive_storagedesc(ClassType) - super types [Referenceable] - attributes [cols, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories] +hive_column(ClassType) - super types [Referenceable] - attributes [name, type, comment, table] +hive_table(ClassType) - super types [DataSet] - attributes [name, db, owner, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary] +hive_process(ClassType) - super types [Process] - attributes [name, startTime, endTime, userName, operationType, queryText, queryPlan, queryId] +hive_principal_type(EnumType) - values [USER, ROLE, GROUP] hive_order(StructType) - attributes [col, order] -hive_resourceuri(StructType) - attributes [resourceType, uri] hive_serde(StructType) - attributes [name, serializationLib, parameters] -hive_type(ClassType) - super types [] - attributes [name, type1, type2, fields] -hive_storagedesc(ClassType) - super types [Referenceable] - attributes [cols, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories] -hive_role(ClassType) - super types [] - attributes [roleName, createTime, ownerName] -hive_column(ClassType) - super types [Referenceable] - attributes [name, type, comment] -hive_table(ClassType) - super types [DataSet] - attributes [name, db, owner, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, parameters, viewOriginalText, viewExpandedText, tableType, temporary] -hive_partition(ClassType) - super types [Referenceable] - attributes [values, table, createTime, lastAccessTime, sd, columns, parameters] -hive_process(ClassType) - super types [Process] - attributes [startTime, endTime, userName, operationType, queryText, queryPlan, queryId, queryGraph] </verbatim> -The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying/lineage as well. Note that name, dbName and tableName should be in lower case. clusterName is explained below. +The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying/lineage as well. Note that dbName, tableName and columnName should be in lower case. clusterName is explained below. * hive_db - attribute qualifiedName - <dbName>@<clusterName> - * hive_table - attribute qualifiedName - <dbName>.<name>@<clusterName> + * hive_table - attribute qualifiedName - <dbName>.<tableName>@<clusterName> * hive_column - attribute qualifiedName - <dbName>.<tableName>.<columnName>@<clusterName> * hive_process - attribute name - <queryString> - trimmed query string in lower case ---++ Importing Hive Metadata org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the hive metadata into Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator. import-hive.sh command can be used to facilitate this. -Set the following configuration in <atlas-conf>/atlas-application.properties and set environment variable $HIVE_CONF_DIR to the hive conf directory: +Set the following configuration in hive-site.xml and set environment variable $HIVE_CONF_DIR to the hive conf directory: <verbatim> <property> <name>atlas.cluster.name</name> @@ -66,7 +60,7 @@ Follow these instructions in your hive set-up to add hive hook for Atlas: * Copy <atlas-conf>/atlas-application.properties to the hive conf directory. The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details: - * atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false + * atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false. Recommended to be set to false to avoid delays in hive query completion. * atlas.hook.hive.numRetries - number of retries for notification failure. default 3 * atlas.hook.hive.minThreads - core number of threads. default 5 * atlas.hook.hive.maxThreads - maximum number of threads. default 5 @@ -78,4 +72,11 @@ Refer [[Configuration][Configuration]] for notification related configurations ---++ Limitations * Since database name, table name and column names are case insensitive in hive, the corresponding names in entities are lowercase. So, any search APIs should use lowercase while querying on the entity names - * Only the following hive operations are captured by hive hook currently - create database, create table, create view, CTAS, load, import, export, query, alter database, alter table(except alter table replace columns and alter table change column position), alter view (except replacing and changing column position) + * The following hive operations are captured by hive hook currently + * create database + * create table/view, create table as select + * load, import, export + * DMLs (insert) + * alter database + * alter table (skewed table information, stored as, protection is not supported) + * alter view http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/c9ee6d3f/release-log.txt ---------------------------------------------------------------------- diff --git a/release-log.txt b/release-log.txt index 57525d7..ad3ceaa 100644 --- a/release-log.txt +++ b/release-log.txt @@ -23,6 +23,7 @@ ATLAS-409 Atlas will not import avro tables with schema read from a file (dosset ATLAS-379 Create sqoop and falcon metadata addons (venkatnrangan,bvellanki,sowmyaramesh via shwethags) ALL CHANGES: +ATLAS-899 Fix Hive Hook documentation (sumasai via yhemanth) ATLAS-890 Log received messages in case of error (sumasai via yhemanth) ATLAS-888 NPE in NotificationHookConsumer (sumasai via shwethags) ATLAS-884 Process registration should call Entity update instead of create (sumasai)
