Chris Riccomini created SAMZA-70:
------------------------------------
Summary: Create setup class to handle per-job startup setup
Key: SAMZA-70
URL: https://issues.apache.org/jira/browse/SAMZA-70
Project: Samza
Issue Type: Bug
Components: container
Affects Versions: 0.6.0
Reporter: Chris Riccomini
There is some Samsa job setup that happens before tasks can be run. This
includes setting up the checkpoint and state management (change log) factories.
For example, we want to verify that the change log and checkpoint topics exist,
and if not, create them with the proper number of partitions.
We should pull this logic into a SetupJob class, and move the execution into a
new YarnAppMasterListener called SamsaAppMasterSetup, which should do the job
setup during the init() call. In addition, we should execute the same SetupJob
class logic in the ProcessJob.submit and ThreadJob.submit methods, as well.
The motivation for this is threefold:
1. There is a race condition in the TaskRunner when multiple containers for a
single job are running in YARN, where each TaskRunner is trying to create the
checkpoint/change log topics when they don't exist.
2. It makes implementing the TaskRunner logic in other languages easier, since
non-Java TaskRunner implementations won't have to set the topics up. The
SetupClass will be handled in the AM (under YARN) or in the Java code of the
ProcessJob/ThreadJob (under local job).
3. It gives us a place to run a single chunk of code in controlled, single
threaded way, before any of the TaskRunners start.
Some things to consider: is it OK to just hard-code that the SetupJob class
should always just setup the checkpoint manager and change log topics? Do we
need to add a setup() method to the lifecycle for everything?
--
This message was sent by Atlassian JIRA
(v6.1#6144)