Hi Navina, Thanks for the review and the comments. Please find my replies inline.
1. It is always very useful to provide more context to the reader, esp. in explaining what the different terms mean (like host-affinity, tombstone etc) and how it relates to the problem being described." >> Updated the design doc with a glossary section, where the terms are described briefly. 2. "The Host Affinity feature in Samza enables it to restore local state from disk instead of bootstrapping the entire changelog" -> host-affinity as a features only tries to bring-up the container in the same host as before. This will help samza leverage the locally persisted store data. It doesn't actually help it restore state in anyway. >> I've rephrased it accordingly in the design doc. 3. "To achieve this, Samza stores local state for change logged stores in a shared directory so it is not tied to a resource manager’s storage structure and cleanup schedule." -> I think by shared directory, you are referring to the yarn application's workspace. This shared workspace is part of the NM, not the RM. You can rephrase this and additionally, provide the logical path to the state stores. >> Yes, it was mentioned incorrectly. I've fixed it in the design doc. 4. " Expose an API in samzarest that" -> Can you elaborate what the API looks like ? >> This API would take in jobId and jobName as parameters and return the preferred host for all the tasks in the job. Request URL: http://Host:Port/v1/jobs/{jobName}/{jobId}/containers Sample json response { "jobName" : "Job name", "jobId" : "Job id", "containers" : [ { "name" : "Container name", "id" : “1”, "tasks" : [{ "name" : "Task name", "partitions" : ["Id 1","Id 2"], "preferredHost" : "Host name" }] }]} Alternatively, granular API’s at task and container levels could be exposed rather than a single API returning the complete job model hierarchy. To construct the complete job hierarchy with the granular API’s, job's coordinator stream has be queried multiple times(for each of the containers and tasks), leading to performance problems. 5. Is the rest-api to be invoked by the monitor for all jobs in the cluster or all running jobs ? What is the criteria there? Please do mention them, if any. >> Monitor will use the rest-api for all the jobs in the cluster that has host affinity enabled. Updated Design doc is here: https://issues.apache.org/jira/secure/attachment/12827691/DESIGN-SAMZA-656.pdf Please let me know your thoughts. Thanks.