[ https://issues.apache.org/jira/browse/MAPREDUCE-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280147#comment-13280147 ]
Sharad Agarwal commented on MAPREDUCE-3315: ------------------------------------------- Thanks Nikhil. Overall a good start. Needs some changes primarily to make the framework scalable and fault tolerant. Specific comments as follows: - MWProtocol: Would be better to have a heartbeat kind of interface between worker and master via which worker sends the status reports and results of completed workunits as HeartbeatRequest and gets more WorkUnits and instructions as HeartbeatResponse. Using MWMessage for passing workunit and result is confusing. workunit and result could be just Writable objects. special instructions like kill should be given via separate Action commands in the HeartbeatResponse. - MWWorkerRunner: If unable to contact Master for sometime, then worker should do a suicide - MWMasterRunner: if doesn't receive the heartbeat from worker for certain time period, then mark the worker as killed and launch a new worker. (The assumption is that doWork in is idempotent). - AMRMProtocolWraper: requestContainer() is currently blocking. This will have a high startup cost if number of workers are high. - MWApplicationMaster: addWorker: is blocking. Requests container and container launch is sequential. Will have high worker startup cost. Should be done via a thread pool for parallel launches. See ContainerLauncher in MR application master. For each worker launch, a new ContainerManagerWrapper thread is created. This is not scalable. - Extend org.apache.hadoop.yarn.service.AbstractService or CompositeService for all moving components - Add code comments. Javadocs to public and protected methods. - Add unit tests. - Dynamic worker pool: minWorker/maxWorkers. Needs client protocol say MWClientProtocol to see the status of the overall application. Potentially submit new workunits, kill workers, add workers etc. > Master-Worker Application on YARN > --------------------------------- > > Key: MAPREDUCE-3315 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3315 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Sharad Agarwal > Assignee: Sharad Agarwal > Fix For: 0.24.0 > > Attachments: MAPREDUCE-3315-1.patch, MAPREDUCE-3315-2.patch, > MAPREDUCE-3315-3.patch, MAPREDUCE-3315.patch > > > Currently master worker scenarios are forced fit into Map-Reduce. Now with > YARN, these can be first class and would benefit real/near realtime workloads > and be more effective in using the cluster resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira