Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

Navina Ramesh Tue, 21 Mar 2017 14:42:07 -0700

Hi everyone,
I have updated the SEP
<https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza>
based on all the feedback. Feel free to comment.


I will start the [vote] mail thread, if there are no further questions
within the next 24 hours.

Thanks!
Navina

On Tue, Mar 21, 2017 at 10:33 AM, Navina Ramesh (Apache) <[email protected]>
wrote:

> Hi Jagadish,
> Thanks for the suggestion. You are right in that it should be the
> responsibility of the JobCoordinator to assign identifiers.
>
> > 'm only wondering if this logic could instead reside inside the
> Job Coordinator (which is internal to the StreamProcessor) instead of
> relying on something external to it?
>
> I think this is a consequence of our initial StandaloneJobCoordinator,
> which is pretty much a pass-through. I didn't see any usage for
> getProcessorId() and was wondering why we put it in the JobCoordinator
> interface. I think I should keep your design proposal from last year handy
> :) Thanks for pitching in!
>
>
> @All:
> Yesterday, there was a discussion on naming of the configuration used in
> this SEP - whether it should be within the "job" scope or "app" scope
> (introduced by SAMZA-1041
> <https://issues.apache.org/jira/browse/SAMZA-1041>).  Multi-stage feature
> and fluent-api for Samza introduces the notion of "application". Since the
> processorId generator config applies to all jobs within a Samza
> application, we decided to add the config for generator under "app" scope.
> Further details on config scope changes can be found in SAMZA-1120.
> <https://issues.apache.org/jira/browse/SAMZA-1120>
>
> I will send out an update once I change the SEP based on yesterday's
> meeting and Jagadish's idea.
>
> Thanks!
> Navina
>
> On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman <
> [email protected]> wrote:
>
>> Thanks for writing this SEP!
>>
>> Here's an alternate approach instead of taking the "String processorId" as
>> a parameter in the constructor. In my view, the "processorId" could be
>> generated by the StreamProcessor internally (instead of being generated
>> up-stream and passed in). The Job Coordinator API could be as follows:
>>
>>
>> public interface JobCoordinator {
>>
>>  ProcessorIdGenerator getProcessorIDGenerator();
>>
>> // could be String getProcessorID()
>>
>>  JobModel getJobModel();
>>
>> }
>>
>> public interface ProcessorIDGenerator {
>>
>>  String getProcessorID();
>> }
>>
>>
>> For instance, an Yarn job coordinator can merely parse the ID from config,
>> and return it. A Zk backed implementation of the Job coordinator can agree
>> on IDs using coordination leveraging Zk. One nice property with this
>> approach is that it keeps all logic related to coordination, agreement on
>> the Job Model, leader election (with potentially pluggable components for
>> each) inside the JobCoordinator.
>>
>> To be clear, I'm all for pluggability for ID generation logic that this
>> SEP
>> advocates. I'm only wondering if this logic could instead reside inside
>> the
>> Job Coordinator (which is internal to the StreamProcessor) instead of
>> relying on something external to it?
>>
>> Of course, there may be other considerations around the way the current
>> code is structured that may prevent this. Let me know if you agree with
>> this change.
>>
>> Thanks,
>> Jag
>>
>>
>> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh
>> <[email protected]
>> > wrote:
>>
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > discussion email once I am done.
>> >
>> > Perfect! :)
>> >
>> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu <[email protected]>
>> wrote:
>> >
>> > > Right, the static factory is very simple as you said. It's pretty
>> > > convenient for the client to use.
>> > >
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > > discussion email once I am done.
>> > >
>> > > Thanks,
>> > > Xinyu
>> > >
>> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <
>> > [email protected]
>> > > >
>> > > wrote:
>> > >
>> > > > > One minor thing I found is that the name of the config is camel
>> case
>> > > > (*processor.idGenerator.class*). Seems Samza's practice is to use
>> all
>> > > > lower
>> > > > case configs with "." delimiter. Do you think we should stick to
>> this
>> > > > convention?
>> > > >
>> > > > I am always torn between the "convention" we have and the better
>> way of
>> > > > doing things. But I don't have strong opinions about it. I can
>> change
>> > it.
>> > > >
>> > > > > One more suggestion is to have a static factory method in the
>> > > > ProcessorIdGenerator (Like what we have in ApplicationRunner):
>> > > >
>> > > > I couldn't grasp these requirements from the ApplicationRunner
>> design.
>> > It
>> > > > will be great if you can put it out in an SEP :)
>> > > >
>> > > > I can add the static factory method for it. Just to clarify, the
>> static
>> > > > method simply class loads the ProcessorIdGenerator ? It uses
>> reflection
>> > > to
>> > > > create the instance ?
>> > > >
>> > > > Thanks!
>> > > > Navina
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu <[email protected]>
>> > > wrote:
>> > > >
>> > > > > The proposal looks great to me! Changing the id type to string
>> will
>> > > make
>> > > > > sure this can work with other types of cluster which doesn't
>> support
>> > > > > integer id. The interface and config provides a pluggable way to
>> have
>> > > > > different id generators for different use cases. One minor thing I
>> > > found
>> > > > is
>> > > > > that the name of the config is camel case
>> > > (*processor.idGenerator.class*
>> > > > ).
>> > > > > Seems Samza's practice is to use all lower case configs with "."
>> > > > delimiter.
>> > > > > Do you think we should stick to this convention?
>> > > > >
>> > > > > One more suggestion is to have a static factory method in
>> > > > > the ProcessorIdGenerator (Like what we have in ApplicationRunner):
>> > > > >
>> > > > > static ProcessIdGenerator fromConfig(Config config) { ... }.
>> > > > >
>> > > > > With this, It will be more convenient for the ApplicationRunner to
>> > > > > construct the generator. What do you think?
>> > > > >
>> > > > > Thanks,
>> > > > > Xinyu
>> > > > >
>> > > > > On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) <
>> > > > > [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi everyone,
>> > > > > > I created a proposal for SAMZA-1126, which addresses the
>> semantics
>> > of
>> > > > > > ProcessorId in Samza. For most purposes, ProcessorId is same as
>> the
>> > > > > logical
>> > > > > > id that Samza assigns for each Yarn container. It is primarily
>> used
>> > > in
>> > > > > > JobModel as a key for the corresponding ContainerModel and
>> also, in
>> > > > > > container-level metrics. We are expanding the applicability of
>> > > > > processorId
>> > > > > > to be beyond a fixed set of processors.
>> > > > > >
>> > > > > > Please review and comment on this SEP.
>> > > > > >
>> > > > > > For those who are not actively following the master branch, you
>> may
>> > > > have
>> > > > > > more questions than others. Feel free to ask them here.
>> > > > > >
>> > > > > > @Xinyu: Since you are working on SAMZA-1067 and other related
>> > > > integration
>> > > > > > APIs, can you please add an SEP for SAMZA-1067 ? This will help
>> > > others
>> > > > > (adn
>> > > > > > me as well) get on the same page with your design/code. Let me
>> know
>> > > if
>> > > > > > SEP-1 will work per your design for ApplicationRunner.
>> > > > > >
>> > > > > > Thanks!
>> > > > > > Navina
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Navina R.
>> >
>>
>>
>>
>> --
>> Jagadish V,
>> Graduate Student,
>> Department of Computer Science,
>> Stanford University
>>
>
>


-- 
Navina R.

Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

Reply via email to