>> 1. We will need a break down of introducing Spark to the stack; required version due to HDP support; do we want to update HDP support before this?; Spark tuning/defaults; Spark configuration support / UI etc
All sounds useful. I'm not sure how much of that we can do before we have the code that actually runs in Spark though. For example, you can't provide tuning defaults or configuration support until you have the code that needs tuned and configured. I see these as good follow-ons though. >> 2. When I read this, it seems like a Lambda architecture approach. Should we, as part of this start exploring the possibility to replacing storm with spark streaming such that we do not have to maintain separate streaming vs. batch codebases? Yes, I definitely think thats a likely possibility. Its probably not something I want to bite off as part of this work though. I'd like to just focus on getting the Batch Profiler functionality right. Here is what I had in mind for the initial set of PRs for the feature branch. 1. There are some backwards-compatible changes needed to run the core Profiler. The current Profiler ports in Storm and REPL would continue to work as-is. 2. Introduce the initial code for the Batch Profiler. This would require the user to manually install Spark and do some manual setup to deploy and run it. 3. Iterate on some unit and integration test enhancements for the Batch Profiler. 3. Create packaging; the RPMs, DEBs. 4. Add support in the MPack to deploy the Batch Profiler. 5. Enhance support for alternative input formats. Initially support the raw JSON we archive in Full Dev. But in real-world use cases, the telemetry is going to be stored in alternative formats like ORC. I'd like to make it as easy as possible to support multiple input formats. Thanks On Mon, Jul 30, 2018 at 9:50 AM, Otto Fowler <ottobackwa...@gmail.com> wrote: > I think the feature branch is a good idea, but what is in the feature > branch or feature branches will have to shake out. > > I agree in concept with what you have in the jira, but I have two points. > > 1. We will need a break down of introducing Spark to the stack > - required version due to HDP support > - do we want to update HDP support before this? > - Spark tuning/defaults > - Spark configuration support / UI etc > - more…. > 2. > > When I read this, it seems like a Lambda architecture approach. Should > we, as part of this start exploring the possibility to replacing storm with > spark streaming such that we do not have to maintain separate streaming vs. > batch codebases? > 3. This mechanism would be used in the future for telemetry ‘replay’. > That would mean that ( IMHO ) > - we should understand that case as well for this > - build this capability out such that it is generic enough that a > second use will not warrant a re-write or huge refactor > > I think this breaks down to a few sets of functionality: > > - > > Base support for deployment, management or spark > - > > Metron services for triggering, and monitoring of Apache Spark ( on > demand and constant ), maybe rest stuff like the caps > - > > UI / Stellar base support > - > > Build out of Batch Profiler service on top of that > - > > Build out of replay service on top of that ( plus all the replay stuff > that needs to also be done - like are you replacing data or having two > sets…. trial runs etc ) > - > > ???? > - > > profit > > > > > On July 27, 2018 at 11:29:51, Nick Allen (n...@nickallen.org) wrote: > > Hi Everyone - > > A while back I opened up a discuss thread around the general idea of a > Batch Profiler [1]. I'd like to start making progress on a first draft of > that functionality. > > I created METRON-1699 [2] which outlines the general approach and ideas. > If you're interested, review that JIRA and let me know if you have > feedback. I will be adding sub-tasks to that JIRA as I make progress and > can separate it into logical bits for review. > > I would like this effort to use a feature branch as it will take a number > of PRs to get a first cut on the functionality. Pending no disagreement, I > will create the feature branch based on METRON-1699. > > [1] > https://lists.apache.org/thread.html/d28d18cc9358f5d9c276c7c304ff4e > e601041fb47bfc97acb6825083@%3Cdev... > <https://lists.apache.org/thread.html/d28d18cc9358f5d9c276c7c304ff4e > e601041fb47bfc97acb6825083@%3Cdev.metron.apache.org%3E> > [2] https://issues.apache.org/jira/browse/METRON-1699 > >