Hi, Lukas, Sorry to reply late. My comments below
On Tue, Oct 20, 2015 at 7:14 AM, Lukas Steiblys <lu...@doubledutch.me> wrote: > > >> I have been thinking lately about the most non-invasive way to add > >> multithreading capabilities to ThreadJobFactory, as that is the main > >> method > >> we run our jobs in production. Looking at the master branch code in Git, > >> I > >> have found the following: > >> a.. The best way would be to simply spin up a new thread for each > >> container. > >> b.. The number of containers can already be specified using the > >> configuration property job.container.count. > >> c.. I can construct a new SamzaContainer for each containerModel > >> returned from coordinator.jobModel.getContainers in ThreadJobFactory. > I like this idea. So I guess this would be a simplified version of parallelized Samza jobs within one JVM. It could be useful for users that need multiple parallel threads but does not need to scale the job across multiple machines. We do have a plan to work on a similar multi-thread model in a distributed environment in LinkedIn. The main difference here would be the coordinator would need to have a distributed version of implementation to include partition assignment, checkpoint, changelog initialization. It would be ideal if the coordinator's API can stay the same in both single node mode and distributed mode. I would be able to comment more when we have a sketchy design doc. > >> d.. I can pass a list of these containers into ThreadJob constructor > >> modifying it to accept an array of Runnables. > >> e.. For each runnable, it would create a new thread and start it in > the > >> submit method of ThreadJob. > >> This should start up a new thread for each container and group the tasks > >> using the appropriate TaskNameGrouper. > Wouldn't it make more sense to have the TaskNameGrouper be part of the coordinator logic as well? And each container will just take a ContainerModel to start. That way, the container logic would be much more simplified. > >> > >> Any ideas on what I might have missed? Are there any other potential > >> solutions? Would this be a good patch for Samza in general? > >> > >> Lukas > >> > > > Definitely worth to push forward in that direction. This is aligned w/ our effort to build a standalone execution model of Samza. Thanks a lot for putting so much thoughts in this direction. -Yi