Thank you for all the hard work Curtis, I will start reviewing. - Nathanael
> On May 1, 2018, at 1:12 PM, Curtis Howard <cur...@cloudera.com> wrote: > > Hi all, > > We had discussed prototyping Envelope for ingest in the past - I've > submitted a PR for this which includes: > - Kafka -> Spark streaming -> ODM Hive table applications for dns, flow > and proxy raw source data > - a simple alternative for source data collection/dissection using > tshark/nfdump/unzip + Flume (sinking data to Kafka) > - https://github.com/apache/incubator-spot/pull/144 > > To quote directly from the Envelope site (https://github.com/cloudera- > labs/envelope#envelope): > *"Envelope is simply a pre-made Spark application that implements many of > the tasks commonly found in ETL pipelines. In many cases, Envelope allows > large pipelines to be developed on Spark with no coding required. When > custom code is needed, there are pluggable points in Envelope for core > functionality to be extended. Envelope works in batch and streaming modes."* > > For example, the complete Kafka/SparkStreaming/ODM ingest application > definition for DNS: > https://github.com/curtishoward/incubator-spot/ > blob/SPOT-181_envelope_ingest/spot-ingest/odm/workers/spot_proxy.conf > > From the perspective of the Spot project, my thoughts are that it would > enable: > - faster turnaround time to ingest new source types while still allowing > for arbitrarily complex ETL pipelines (data enrichment, data quality > checks, etc..) > - simplify future integration with other storage layers (HBase, Kudu, for > example) > - a framework that is simple to extend (input sources, output storage > layers, translators, derivers, UDFs, ...) > > If there is interest, I will continue to refactor the current > implementation - centralize/integration configuration with spot.conf, test > Kerberos integration, run performance tests and tune as possible. > > In the near term, I will also add a PR with Hive views for dns/flow/proxy > under spot-ml/ - this should enable an end-to-end proof-of-concept ODM > implementation using Envelope. > > Thanks > Curtis