[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831914#action_12831914 ]
Xing Shi commented on MAPREDUCE-1434: ------------------------------------- checked this: *_1. TimeOut_* _The timeout is what I have forgotten. I think it can be set unlimited? or the reduce task is set unlimited._ Indeed, there is still a map pending , so the reducer is always in shuffle phase. I have tested that In shuffle phase, the timeout doesn't have effect(why?), so we can ignore the time out. > Dynamic add input for one job > ----------------------------- > > Key: MAPREDUCE-1434 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Environment: 0.19.0 > Reporter: Xing Shi > > Always we should firstly upload the data to hdfs, then we can analize the > data using hadoop mapreduce. > Sometimes, the upload process takes long time. So if we can add input during > one job, the time can be saved. > WHAT? > Client: > a) hadoop job -add-input jobId inputFormat ... > Add the input to jobid > b) hadoop job -add-input done > Tell the JobTracker, the input has been prepared over. > c) hadoop job -add-input status jobid > Show how many input the jobid has. > HOWTO? > Mainly, I think we should do three things: > 1. JobClinet: here JobClient should support add input to a job, indeed, > JobClient generate the split, and submit to JobTracker. > 2. JobTracker: JobTracker support addInput, and add the new tasks to the > original mapTasks. Because the uploaded data will be > processed quickly, so it also should update the scheduler to support pending > a map task till Client tells the Job input done. > 3. Reducer: the reducer should also update the mapNums, so it will shuffle > right. > This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.