[ 
https://issues.apache.org/jira/browse/FALCON-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated FALCON-1096:
------------------------------
    Description: 
In my organisation we create a Hive table for each production dataset in HDFS. 
When creating a Hive table, you supply a lot of information about your dataset: 
its name, fields and their types and comments, the location, the data format, 
properties in form of the key-value pairs and meaningful description of the 
dataset. We think of Hive as a central and nicely documented repository of our 
datasets.

When using Falcon, we again need to create Falcon feed for each dataset (that 
corresponds to a Hive table) and even specify multiple redundant properties 
(e.g. description).

To make it simpler, Falcon could scan the Hive Metastore and automatically 
create feeds for each Hive table and inherit its properties.

The properties of Hive tables could be also used when searching for a dataset 
using new Falcon Web UI e.g. field name, field comment, file format (some other 
statistics like total file size, the last modification or access time could be 
also used).

  was:
In my organisation we create a Hive table for each production dataset in HDFS. 
When creating a Hive table, you supply a lot of information about your dataset: 
its name, fields and their types and comments, the location, the data format, 
properties in form of the key-value pairs and meaningful description of the 
dataset. We think of Hive as a central and nicely documented repository of our 
datasets.

When using Falcon, we again need to create Falcon feed for each dataset (that 
corresponds to a Hive table) and even specify multiple redundant properties 
(e.g. description).

To make it simpler, Falcon could scan the Hive Metastore and automatically 
create feeds for each Hive table and inherit its properties.

The properties of Hive tables could be also used when searching for a dataset 
using new Falcon Web UI e.g. field name, field comment, file format (some other 
statistics like total size, the last modification or access time could be also 
used).


> Scan Hive Metastore to automatically create Falcon feeds for existing Hive 
> tables
> ---------------------------------------------------------------------------------
>
>                 Key: FALCON-1096
>                 URL: https://issues.apache.org/jira/browse/FALCON-1096
>             Project: Falcon
>          Issue Type: New Feature
>            Reporter: Adam Kawa
>
> In my organisation we create a Hive table for each production dataset in 
> HDFS. When creating a Hive table, you supply a lot of information about your 
> dataset: its name, fields and their types and comments, the location, the 
> data format, properties in form of the key-value pairs and meaningful 
> description of the dataset. We think of Hive as a central and nicely 
> documented repository of our datasets.
> When using Falcon, we again need to create Falcon feed for each dataset (that 
> corresponds to a Hive table) and even specify multiple redundant properties 
> (e.g. description).
> To make it simpler, Falcon could scan the Hive Metastore and automatically 
> create feeds for each Hive table and inherit its properties.
> The properties of Hive tables could be also used when searching for a dataset 
> using new Falcon Web UI e.g. field name, field comment, file format (some 
> other statistics like total file size, the last modification or access time 
> could be also used).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to