Yi,

Thanks for taking time to review this and providing your feedback.
Responses inline.

>> 1) ContainerInfo: (containerId, physicalResourceId): this mapping is the
reported mapping from container processes in standalone, and preferred
mapping in YARN

>> 2) TaskLocality: (taskId, physicalResourceId):this mapping is actually
reported task location from container processes in both standalone and
YARN.

Yes, totally agree with above points. I think the group interface contract
already reflects that. locationId registered by live processors and task
locality of previous generation will be used to calculate the assignment of
current generation in standalone. Preferred host mapping will be used for
task and processor locality in case of yarn. Any new task/processor for
which grouping in unknown(unavailable in preferred host/task-locality in
underlying storage layer), will be treated as any_host during assignment.

I don't think it's a good idea to unify the locality storage formats
between yarn and standalone as a part of this change(which will require a
elaborate migration plan and extensive testing). I think it's fair to
consider it's out of scope for this proposal.

>>  Should the leader validate that everyone has picked up the new version
of JobModel and reported the correct task-locality expected in the
JobModel, after step 10 in the graph?
 Though it's a extra precaution taken by leader for ensuring correctness, i
think it might be a premature optimization. Even in existing setup in yarn,
we don't have corresponding validations by ApplicationMaster after
generations. I think it's fair to keep the behavior synonymous.

>> Why are we missing a processor-to-locationId mapping in the zknode data
model?
It is stored as a part of live processors as a part of value. I had it my
initial proposal, but received feedback just to add things that i'm
changing in the existing setup(hence removed it). Added it back now.

>> Also, why don’t we write locationId as a value to task0N znode, instead
of a child node?
It is stored as a value of the task zookeeper node. I was unable to
represent it pictorially in that zookeeper hierarchical model(hence had it
like one level down like child). Added corresponding descriptions in that
data model to make it clear.

>> And which znode is the distributed barrier that you used in the graph?
This was removed after initial feedback(suggesting to add stuff that I'm
changing in data-model). Added the barrier zookeeper node to data-model for
clarity.

I think we are on same page about most of the choices made in this
proposal. If there are other major concerns/feedback, let's discuss offline.

Thanks.

Reply via email to