Chris Riccomini created SAMZA-70:
------------------------------------

             Summary: Create setup class to handle per-job startup setup
                 Key: SAMZA-70
                 URL: https://issues.apache.org/jira/browse/SAMZA-70
             Project: Samza
          Issue Type: Bug
          Components: container
    Affects Versions: 0.6.0
            Reporter: Chris Riccomini


There is some Samsa job setup that happens before tasks can be run. This 
includes setting up the checkpoint and state management (change log) factories. 
For example, we want to verify that the change log and checkpoint topics exist, 
and if not, create them with the proper number of partitions.

We should pull this logic into a SetupJob class, and move the execution into a 
new YarnAppMasterListener called SamsaAppMasterSetup, which should do the job 
setup during the init() call. In addition, we should execute the same SetupJob 
class logic in the ProcessJob.submit and ThreadJob.submit methods, as well.

The motivation for this is threefold:

1. There is a race condition in the TaskRunner when multiple containers for a 
single job are running in YARN, where each TaskRunner is trying to create the 
checkpoint/change log topics when they don't exist.

2. It makes implementing the TaskRunner logic in other languages easier, since 
non-Java TaskRunner implementations won't have to set the topics up. The 
SetupClass will be handled in the AM (under YARN) or in the Java code of the 
ProcessJob/ThreadJob (under local job).

3. It gives us a place to run a single chunk of code in controlled, single 
threaded way, before any of the TaskRunners start.

Some things to consider: is it OK to just hard-code that the SetupJob class 
should always just setup the checkpoint manager and change log topics? Do we 
need to add a setup() method to the lifecycle for everything?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to