Brett Konold created SAMZA-2312:
-----------------------------------
Summary: Consistent semantics of ApplicationDescriptor across
Standalone & YARN
Key: SAMZA-2312
URL: https://issues.apache.org/jira/browse/SAMZA-2312
Project: Samza
Issue Type: Improvement
Affects Versions: 1.0, 1.1, 1.2
Reporter: Brett Konold
The usage of ApplicationDescriptor in Samza currently paints an ambiguous
picture of its semantics. There is a lack of clarity of what _exactly_ it is
meant to describe.
In Standalone, app descriptor is instantiated a single time, before planning,
and wraps the user's provided and subsequently rewritten configs.
In YARN, however, app descriptor is instantiated once during the planning phase
of deployment, and once again on _each_ container the job is deployed on.
Additionally, what makes things even more confusing here is that before
planning the app descriptor is instantiated it is with user and rewritten
configs, but on container startup it is instantiated with the final set of
_planned_ configs obtained from the JobModel in the AM. This makes it difficult
to draw predictable inferences about how app descriptor is used throughout the
codebase because usage and behavior becomes so dependent on context.
We should have answers to these questions:
1) What "stage" of the application do we want ApplicationDescriptor to be used
to describe? E.g., exclusively what the user provides (user config only, input
/ output stream and system descriptors, etc), or some mix between user and
system provided configs (e.g. rewritten or potentially planned configs). What
we decide should eventually be consistent between YARN & standalone.
2) Do we want to provide any guarantees to customers about the # of executions
of SamzaApplication.describe()? Currently the # of executions is singular in
standalone, and proportional the # of containers in YARN.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)