Following up on this thread - Envelope 0.6.0 is available, now using all
upstream Apache dependencies (rather than CDH services)

At a minimum, for what we would use it for, it requires:
Apache Spark 2.1.0 or above
Apache Kafka 0.10 or above

Thanks
Curtis

On Wed, May 9, 2018 at 4:35 PM, Curtis Howard <cur...@cloudera.com> wrote:

> Hi all,
>
> As a follow up to this thread, I've confirmed with the Envelope team that
> the next release (0.6.0, ETA later this summer) will move to using
> upstream dependencies rather than Cloudera's (for Spark, Kafka, HBase,
> etc.).  Envelope will also begin taking public code contributions soon -
> likely next month.
>
> As I understand it, the general goal is for Envelope to move into the
> public OSS space, similar to the paths of other projects like Impala and
> Kudu.
>
> Thanks
> Curtis
>
> On Thu, May 3, 2018 at 4:48 PM, Tadd Wood <tadd.w...@digitalminion.com>
> wrote:
>
>> Curtis,
>>
>> Excited to take a look as well :).  Thanks for the hard work on this.
>>
>> Thank you,
>> Tadd Wood
>>
>>
>>
>> > On May 2, 2018, at 4:45 AM, Austin Leahy <aus...@digitalminion.com>
>> wrote:
>> >
>> > Curtis this is very cool thanks for putting so much time into this will
>> > check out the PR and comment.
>> >
>> > On Tue, May 1, 2018 at 3:37 PM Curtis Howard <cur...@cloudera.com>
>> wrote:
>> >
>> >> Hi Nathanael,
>> >>
>> >> So far only https://github.com/Open-Network-Insight/spot-nfdump.git
>> >>
>> >> The PR code is a proof-of-concept at this point - look forward to your
>> >> thoughts on next steps though!
>> >>
>> >> Thanks again
>> >> Curtis
>> >>
>> >> On Tue, May 1, 2018 at 6:28 PM, Nate Smith <natedogs...@gmail.com>
>> wrote:
>> >>
>> >>> Curtis,
>> >>>
>> >>> Have you tested this with a standard version of nfdump? Or only
>> >>> spot-nfdump?
>> >>>
>> >>> - Nathanael
>> >>>
>> >>>> On May 1, 2018, at 1:12 PM, Curtis Howard <cur...@cloudera.com>
>> wrote:
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> We had discussed prototyping Envelope for ingest in the past - I've
>> >>>> submitted a PR for this which includes:
>> >>>> - Kafka -> Spark streaming -> ODM Hive table applications for dns,
>> >> flow
>> >>>> and proxy raw source data
>> >>>> - a simple alternative for source data collection/dissection using
>> >>>> tshark/nfdump/unzip + Flume (sinking data to Kafka)
>> >>>> - https://github.com/apache/incubator-spot/pull/144
>> >>>>
>> >>>> To quote directly from the Envelope site (
>> https://github.com/cloudera-
>> >>>> labs/envelope#envelope):
>> >>>> *"Envelope is simply a pre-made Spark application that implements
>> many
>> >> of
>> >>>> the tasks commonly found in ETL pipelines. In many cases, Envelope
>> >> allows
>> >>>> large pipelines to be developed on Spark with no coding required.
>> When
>> >>>> custom code is needed, there are pluggable points in Envelope for
>> core
>> >>>> functionality to be extended. Envelope works in batch and streaming
>> >>> modes."*
>> >>>>
>> >>>> For example, the complete Kafka/SparkStreaming/ODM ingest application
>> >>>> definition for DNS:
>> >>>> https://github.com/curtishoward/incubator-spot/
>> >>>> blob/SPOT-181_envelope_ingest/spot-ingest/odm/workers/spot_p
>> roxy.conf
>> >>>>
>> >>>> From the perspective of the Spot project, my thoughts are that it
>> would
>> >>>> enable:
>> >>>> - faster turnaround time to ingest new source types while still
>> >> allowing
>> >>>> for arbitrarily complex ETL pipelines (data enrichment, data quality
>> >>>> checks, etc..)
>> >>>> - simplify future integration with other storage layers (HBase, Kudu,
>> >>> for
>> >>>> example)
>> >>>> - a framework that is simple to extend (input sources, output storage
>> >>>> layers, translators, derivers, UDFs, ...)
>> >>>>
>> >>>> If there is interest, I will continue to refactor the current
>> >>>> implementation - centralize/integration configuration with spot.conf,
>> >>> test
>> >>>> Kerberos integration, run performance tests and tune as possible.
>> >>>>
>> >>>> In the near term, I will also add a PR with Hive views for
>> >> dns/flow/proxy
>> >>>> under spot-ml/ - this should enable an end-to-end proof-of-concept
>> ODM
>> >>>> implementation using Envelope.
>> >>>>
>> >>>> Thanks
>> >>>> Curtis
>> >>>
>> >>>
>> >>
>>
>>
>

Reply via email to