Re: [DISCUSS] Batch Profiler

Nick Allen Mon, 30 Jul 2018 07:51:33 -0700

>>  1. We will need a break down of introducing Spark to the stack; required
version due to HDP support; do we want to update HDP support before
this?; Spark
tuning/defaults; Spark configuration support / UI etc

All sounds useful. I'm not sure how much of that we can do before we have
the code that actually runs in Spark though.  For example, you can't
provide tuning defaults or configuration support until you have the code
that needs tuned and configured.  I see these as good follow-ons though.

>> 2. When I read this, it seems like a Lambda architecture approach.
Should we, as part of this start exploring the possibility to replacing
storm with spark streaming such that we do not have to maintain separate
streaming vs. batch codebases?

Yes, I definitely think thats a likely possibility.  Its probably not
something I want to bite off as part of this work though.  I'd like to just
focus on getting the Batch Profiler functionality right.

Here is what I had in mind for the initial set of PRs for the feature
branch.

1. There are some backwards-compatible changes needed to run the core
Profiler.  The current Profiler ports in Storm and REPL would continue to
work as-is.
2. Introduce the initial code for the Batch Profiler.  This would require
the user to manually install Spark and do some manual setup to deploy and
run it.
3. Iterate on some unit and integration test enhancements for the Batch
Profiler.
3. Create packaging; the RPMs, DEBs.
4. Add support in the MPack to deploy the Batch Profiler.
5. Enhance support for alternative input formats.  Initially support the
raw JSON we archive in Full Dev. But in real-world use cases, the telemetry
is going to be stored in alternative formats like ORC.  I'd like to make it
as easy as possible to support multiple input formats.

Thanks

On Mon, Jul 30, 2018 at 9:50 AM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> I think the feature branch is a good idea, but what is in the feature
> branch or feature branches will have to shake out.
>
> I agree in concept with what you have in the jira, but I have two points.
>
>    1. We will need a break down of introducing Spark to the stack
>       - required version due to HDP support
>       - do we want to update HDP support before this?
>       - Spark tuning/defaults
>       - Spark configuration support / UI etc
>       - more….
>    2.
>
>    When I read this, it seems like a Lambda architecture approach. Should
>    we, as part of this start exploring the possibility to replacing storm with
>    spark streaming such that we do not have to maintain separate streaming vs.
>    batch codebases?
>    3. This mechanism would be used in the future for telemetry ‘replay’.
>    That would mean that ( IMHO )
>       - we should understand that case as well for this
>       - build this capability out such that it is generic enough that a
>       second use will not warrant a re-write or huge refactor
>
> I think this breaks down to a few sets of functionality:
>
>    -
>
>    Base support for deployment, management or spark
>    -
>
>    Metron services for triggering, and monitoring of Apache Spark ( on
>    demand and constant ), maybe rest stuff like the caps
>    -
>
>    UI / Stellar base support
>    -
>
>    Build out of Batch Profiler service on top of that
>    -
>
>    Build out of replay service on top of that ( plus all the replay stuff
>    that needs to also be done - like are you replacing data or having two
>    sets…. trial runs etc )
>    -
>
>    ????
>    -
>
>    profit
>
>
>
>
> On July 27, 2018 at 11:29:51, Nick Allen (n...@nickallen.org) wrote:
>
> Hi Everyone -
>
> A while back I opened up a discuss thread around the general idea of a
> Batch Profiler [1]. I'd like to start making progress on a first draft of
> that functionality.
>
> I created METRON-1699 [2] which outlines the general approach and ideas.
> If you're interested, review that JIRA and let me know if you have
> feedback. I will be adding sub-tasks to that JIRA as I make progress and
> can separate it into logical bits for review.
>
> I would like this effort to use a feature branch as it will take a number
> of PRs to get a first cut on the functionality. Pending no disagreement, I
> will create the feature branch based on METRON-1699.
>
> [1]
> https://lists.apache.org/thread.html/d28d18cc9358f5d9c276c7c304ff4e
> e601041fb47bfc97acb6825083@%3Cdev...
> <https://lists.apache.org/thread.html/d28d18cc9358f5d9c276c7c304ff4e
> e601041fb47bfc97acb6825083@%3Cdev.metron.apache.org%3E>
> [2] https://issues.apache.org/jira/browse/METRON-1699
>
>

Re: [DISCUSS] Batch Profiler

Reply via email to