Following up on this thread - Envelope 0.6.0 is available, now using all upstream Apache dependencies (rather than CDH services)
At a minimum, for what we would use it for, it requires: Apache Spark 2.1.0 or above Apache Kafka 0.10 or above Thanks Curtis On Wed, May 9, 2018 at 4:35 PM, Curtis Howard <cur...@cloudera.com> wrote: > Hi all, > > As a follow up to this thread, I've confirmed with the Envelope team that > the next release (0.6.0, ETA later this summer) will move to using > upstream dependencies rather than Cloudera's (for Spark, Kafka, HBase, > etc.). Envelope will also begin taking public code contributions soon - > likely next month. > > As I understand it, the general goal is for Envelope to move into the > public OSS space, similar to the paths of other projects like Impala and > Kudu. > > Thanks > Curtis > > On Thu, May 3, 2018 at 4:48 PM, Tadd Wood <tadd.w...@digitalminion.com> > wrote: > >> Curtis, >> >> Excited to take a look as well :). Thanks for the hard work on this. >> >> Thank you, >> Tadd Wood >> >> >> >> > On May 2, 2018, at 4:45 AM, Austin Leahy <aus...@digitalminion.com> >> wrote: >> > >> > Curtis this is very cool thanks for putting so much time into this will >> > check out the PR and comment. >> > >> > On Tue, May 1, 2018 at 3:37 PM Curtis Howard <cur...@cloudera.com> >> wrote: >> > >> >> Hi Nathanael, >> >> >> >> So far only https://github.com/Open-Network-Insight/spot-nfdump.git >> >> >> >> The PR code is a proof-of-concept at this point - look forward to your >> >> thoughts on next steps though! >> >> >> >> Thanks again >> >> Curtis >> >> >> >> On Tue, May 1, 2018 at 6:28 PM, Nate Smith <natedogs...@gmail.com> >> wrote: >> >> >> >>> Curtis, >> >>> >> >>> Have you tested this with a standard version of nfdump? Or only >> >>> spot-nfdump? >> >>> >> >>> - Nathanael >> >>> >> >>>> On May 1, 2018, at 1:12 PM, Curtis Howard <cur...@cloudera.com> >> wrote: >> >>>> >> >>>> Hi all, >> >>>> >> >>>> We had discussed prototyping Envelope for ingest in the past - I've >> >>>> submitted a PR for this which includes: >> >>>> - Kafka -> Spark streaming -> ODM Hive table applications for dns, >> >> flow >> >>>> and proxy raw source data >> >>>> - a simple alternative for source data collection/dissection using >> >>>> tshark/nfdump/unzip + Flume (sinking data to Kafka) >> >>>> - https://github.com/apache/incubator-spot/pull/144 >> >>>> >> >>>> To quote directly from the Envelope site ( >> https://github.com/cloudera- >> >>>> labs/envelope#envelope): >> >>>> *"Envelope is simply a pre-made Spark application that implements >> many >> >> of >> >>>> the tasks commonly found in ETL pipelines. In many cases, Envelope >> >> allows >> >>>> large pipelines to be developed on Spark with no coding required. >> When >> >>>> custom code is needed, there are pluggable points in Envelope for >> core >> >>>> functionality to be extended. Envelope works in batch and streaming >> >>> modes."* >> >>>> >> >>>> For example, the complete Kafka/SparkStreaming/ODM ingest application >> >>>> definition for DNS: >> >>>> https://github.com/curtishoward/incubator-spot/ >> >>>> blob/SPOT-181_envelope_ingest/spot-ingest/odm/workers/spot_p >> roxy.conf >> >>>> >> >>>> From the perspective of the Spot project, my thoughts are that it >> would >> >>>> enable: >> >>>> - faster turnaround time to ingest new source types while still >> >> allowing >> >>>> for arbitrarily complex ETL pipelines (data enrichment, data quality >> >>>> checks, etc..) >> >>>> - simplify future integration with other storage layers (HBase, Kudu, >> >>> for >> >>>> example) >> >>>> - a framework that is simple to extend (input sources, output storage >> >>>> layers, translators, derivers, UDFs, ...) >> >>>> >> >>>> If there is interest, I will continue to refactor the current >> >>>> implementation - centralize/integration configuration with spot.conf, >> >>> test >> >>>> Kerberos integration, run performance tests and tune as possible. >> >>>> >> >>>> In the near term, I will also add a PR with Hive views for >> >> dns/flow/proxy >> >>>> under spot-ml/ - this should enable an end-to-end proof-of-concept >> ODM >> >>>> implementation using Envelope. >> >>>> >> >>>> Thanks >> >>>> Curtis >> >>> >> >>> >> >> >> >> >