[
https://issues.apache.org/jira/browse/ATLAS-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carol Drummond updated ATLAS-3006:
----------------------------------
Labels: new-feature release-notes (was: )
> Option to ignore/prune metadata for temporary/staging Hive tables
> -----------------------------------------------------------------
>
> Key: ATLAS-3006
> URL: https://issues.apache.org/jira/browse/ATLAS-3006
> Project: Atlas
> Issue Type: Improvement
> Components: atlas-core
> Reporter: Madhan Neethiraj
> Assignee: Madhan Neethiraj
> Priority: Major
> Labels: new-feature, release-notes
> Fix For: 0.8.4, 1.2.0, 2.0.0
>
> Attachments: ATLAS-3006-branch-0.8.patch, ATLAS-3006.patch
>
>
> It is not uncommon for a Hive deployment to use a large number of
> staging/temporary tables, which are created periodically to load data into
> target tables and deleted after completion of data load. A large number of
> entities are created in Atlas for these staging/temporary tables
> (tables/columns/column-lineage).
> For staging tables, it is probably not useful to track details like columns
> and column-lineage in Atlas. Not tracking these details in Atlas can
> significantly reduce the time it takes to process notifications, and can help
> in improving the performance overall. Only minimum details of these staging
> tables can be stored in Atlas, to capture data lineage from source to target
> table via all intermediate staging tables.
> Also, it will be helpful to good to ignore tables that are created & deleted
> during data loading i.e. temporary tables.
> Configurations should be provided to specify which of the tables are
> staging/temporary. In addition to supporting this in Hive hook (to avoid
> generation of large messages for staging/temporary tables), Atlas server
> should also be updated, to control this further at server side while
> processing notifications.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)