[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831608#action_12831608 ]
Owen O'Malley commented on MAPREDUCE-1434: ------------------------------------------ One approach might be to have a subclass of InputFormat, such as: {code} public abstract class IncrementalInputFormat extends InputFormat { InputSplit[] getNewInputSplits(JobContext context) throws IOException; } {code} and such input formats return any "new" splits that they have found since the last time the method was called. > Dynamic add input for one job > ----------------------------- > > Key: MAPREDUCE-1434 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Environment: 0.19.0 > Reporter: Xing Shi > > Always we should firstly upload the data to hdfs, then we can analize the > data using hadoop mapreduce. > Sometimes, the upload process takes long time. So if we can add input during > one job, the time can be saved. > WHAT? > Client: > a) hadoop job -add-input jobId inputFormat ... > Add the input to jobid > b) hadoop job -add-input done > Tell the JobTracker, the input has been prepared over. > c) hadoop job -add-input status jobid > Show how many input the jobid has. > HOWTO? > Mainly, I think we should do three things: > 1. JobClinet: here JobClient should support add input to a job, indeed, > JobClient generate the split, and submit to JobTracker. > 2. JobTracker: JobTracker support addInput, and add the new tasks to the > original mapTasks. Because the uploaded data will be > processed quickly, so it also should update the scheduler to support pending > a map task till Client tells the Job input done. > 3. Reducer: the reducer should also update the mapNums, so it will shuffle > right. > This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.