[
https://issues.apache.org/jira/browse/ATLAS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hemanth Yamijala reassigned ATLAS-183:
--------------------------------------
Assignee: Hemanth Yamijala
> Add a Hook in Storm to post the topology metadata
> -------------------------------------------------
>
> Key: ATLAS-183
> URL: https://issues.apache.org/jira/browse/ATLAS-183
> Project: Atlas
> Issue Type: Sub-task
> Affects Versions: 0.6-incubating
> Reporter: Venkatesh Seetharam
> Assignee: Hemanth Yamijala
> Fix For: 0.6-incubating
>
> Attachments: ATLAS-183.patch
>
>
> Apache Storm Integration with Apache Atlas (incubating)
> Introduction
> Apache Storm is a distributed real-time computation system. Storm makes it
> easy to reliably process unbounded streams of data, doing for real-time
> processing what Hadoop did for batch processing. The process is essentially
> a DAG of nodes, which is called topology.
> Apache Atlas is a metadata repository that enables end-to-end data lineage,
> search and associate business classification.
> Overview
> The goal of this integration is to at minimum push the operational topology
> metadata along with the underlying data source(s), target(s), derivation
> processes and any available business context so Atlas can capture the lineage
> for this topology.
> It would also help to support custom user annotations per node in the
> topology.
> There are 2 parts in this process detailed below:
> Data model to represent the concepts in Storm
> Storm Bridge to update metadata in Atlas
> Data Model
> A data model is represented as a Type in Atlas. It contains the descriptions
> of various nodes in the DAG, such as spouts and bolts and the corresponding
> source and target types. These need to be expressed as Types in Atlas type
> system. At the least, we need to create types for:
> Storm topology containing spouts, bolts, etc. with associations between them
> Source (typically Kafka, etc.)
> Target (typically Hive, HBase, HDFS, etc.)
> You can take a look at the data model code for Hive. Storm should only be
> simpler than Hive from a data modeling perspective.
> Pushing Metadata into Atlas
> There are 2 parts to the bridge:
> Storm Bridge
> This is a one-time import for Storm to list all the active topologies and
> push the metadata into Atlas to address cases where Storm deployments exist
> before Atlas.
> You can refer to the bridge code for Hive.
> Post-execution Hook
> Atlas needs to be notified when a new topology is registered successfully in
> Storm or when someone changes the definition of an existing topology.
> You can refer to the hook code for Hive.
>
> Example use case:
> Custom annotations associated with each node in the topology.
> For example: Data Quality Rules, Error Handling, etc. A set of annotations
> that enumerates rules handling nulls– all nulls for a column get filtered,
> etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)