[ https://issues.apache.org/jira/browse/HADOOP-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HADOOP-4565: ------------------------------------- Status: Patch Available (was: Open) > MultiFileInputSplit can use data locality information to create splits > ---------------------------------------------------------------------- > > Key: HADOOP-4565 > URL: https://issues.apache.org/jira/browse/HADOOP-4565 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: CombineMultiFile.patch, CombineMultiFile2.patch, > CombineMultiFile3.patch > > > The MultiFileInputFormat takes a set of paths and creates splits based on > file sizes. Each splits contains a few files an each split are roughly equal > in size. It would be efficient if we can extend this InputFormat to create > splits such each all the blocks in one split and either node-local or > rack-local. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.