Hey Yi, Thanks for lot for your work on this document. I know it must have been crazy trying to put-together everything in a single doc :)
Here are my comments. Sorry about the delay :( 1. It will be useful to set some background for the benefit of the community members who haven't been following design docs in the JIRAs. Can you briefly explain the definition of StreamApplication and how it translates to jobs through the stack. 2. "Problem" section doesn't seem to describe any problem that ApplicationRunner is solving :) Imo, ApplicationRunner basically provides a unified programming pattern for the user to execute StreamApplications defined using fluent-api or task-level API. I think the problem and motivation section can use a little bit of re-wording. 3. In the "Overview of ApplicationRunner" section: * How the components within ApplicationRunner interact isn't very obvious from the overview image. For example, ExecutionPlanner translates a "StreamApplication" into an "ExecutionPlan" which is essentially a specification of the DAG. (Please correct me, if I am wrong here!). The ExecutionPlan is used by the JobRunner to launch Samza jobs. * The roles of ExecutionPlanner and JobRunner are fairly well-defined. StreamManager seems like a util class that helps class-load systems and create streams. The ExecutionPlan will be consumed by JobRunner and JobRunner will use StreamManager to create intermediate streams, prior to launching jobs. It doesn't sound like a StreamManager is a "component" of the ApplicationRunner. * What is the role of the RuntimeEnvironment? That has not been explained. Maybe explaining that will fill the gap in understanding for the readers. I see that you have tried to explain the flow of control in the code using the sequence diagram. Perhaps, if we can articulate the roles/responsibilities of the RuntimeEnvironment, there will not be a need for the control flow diagram. 4. How is runtime environment defined by the user? Is it configurable ? Answering these questions in the doc will be useful 5. In the "Interaction between RuntimeEnvironment and ApplicationRunners" section: * Samza container is interacting with the RuntimeEnvironment. Does that make the RuntimeEnvironment as a shared component between the LocalApplicationRunner and the SamzaContainer? It doesn't seem to be the case for RemoteApplicationRunner. So, I am confused as to why it is different. 6. In general, what does "app.class" config represent? It seems straightforward when a "StreamApplication" is defined. Is it applicable when using low-level task api? 7. Interface defintions: * Perhaps when you implement this, can you specifically callout if each method is blocking or not in the javadoc ? 8. Minor nit-picking: * "ApplicationRunners in Different Execution Environments" -> should it be RuntimeEnvironments as that is the terminology used in the rest of the document. * In the "How this works in standalone deployment" section: * "Deploy the application to standalone hosts" and *Run run-local-app.sh on each node to start/stop the local application* are probably just a single step - Deploy the application to standalone hosts using run-local-app.sh?? General question: It seems like, even with extensive changes to the interfaces/programming model, we are still class loading the components for most parts. In such a world, we are not close to integrating with frameworks that already have a lifecycle model and can provide instantiated objects directly. For example, in the Samza as a library use-case, it makes sense for the user to provide a JmxServer or a taskFactory or a custom metricReporter for the StreamProcessor. One of the motivations for this case was that most applications are already running within a servlet/jetty container model with its own lifecycle. If ApplicationRunner(s) is the unified interface, doesn't that prohibit Samza from being integrated with such frameworks? Thanks! Navina On Thu, Apr 20, 2017 at 10:06 AM, Jacob Maes <jacob.m...@gmail.com> wrote: > Thanks for the SEP! > > +1 on introducing these new components > -1 on the current definition of their roles (see Design feedback below) > > *Design* > > - If LocalJobRunner and RemoteJobRunner handle the different methods of > launching a Job, what additional value do the different types of > ApplicationRunner and RuntimeEnvironment provide? It seems like a red > flag > that all 3 would need to change from environment to environment. It > indicates that they don't have proper modularity. The > call-sequence-figures > support this; LocalApplicationRunner and RemoteApplicationRunner make > the > same calls and the diagram only varies after jobRunner.start() > - As far as I can tell, the only difference between Local and Remote > ApplicationRunner is that one is blocking and the other is > non-blocking. If > that's all they're for then either the names should be changed to > reflect > this, or they should be combined into one ApplicationRunner and just > expose > separate methods for run() and runBlocking() > - There isn't much detail on why the main() methods for Local/Remote > have such different implementations, how they receive the Application > (direct vs config), and concretely how the deployment scripts, if any, > should interact with them. > > > *Style* > > - nit: None of the 11 uses of the word "actual" in the doc are > *actually* > needed. :-) > - nit: Colors of the runtime blocks in the diagrams are unconventional > and a little distracting. Reminds me of nai won bao. Now I'm hungry. :-) > - Prefer the name "ExecutionEnvironment" over "RuntimeEnvironment". The > term "execution environment" is used > - The code comparisons for the ApplicationRunners are not apples-apples. > The local runner example is an application that USES the local runner. > The > remote runner example is the just the runner code itself. So, it's not > readily apparent that we're comparing the main() methods and not the > application itself. > > > On Mon, Apr 17, 2017 at 5:02 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Made some updates to clarify the role and functions of RuntimeEnvironment > > in SEP-2. > > > > On Fri, Apr 14, 2017 at 9:30 AM, Yi Pan <nickpa...@gmail.com> wrote: > > > > > Hi, everyone, > > > > > > In light of new features such as fluent API and standalone that > introduce > > > new deployment / application launch models in Samza, I created a new > > SEP-2 > > > to address the new use cases. SEP-2 link: https://cwiki.apache. > > > org/confluence/display/SAMZA/SEP-2%3A+ApplicationRunner+Design > > > > > > Please take a look and give feedbacks! > > > > > > Thanks! > > > > > > -Yi > > > > > > -- Navina R.