[ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712524#action_12712524 ]
dhruba borthakur commented on HADOOP-5759: ------------------------------------------ This patch keeps the original aim of combining blocks from different hosts in the same rack into a single split. The fix that that patch attempts is to figure out where such a combined split should reside. > Since only hosts that actually contain the valid blocks are returned in > getMoreSplits with this patch, I agree with Jothi to a certain extent. The number ofsplits remain the same before, but the possibility of scheduling them on the rack where they reside is slightly reduced because we look only at those hosts where this block belongs. It is possible to enhance this patch to create a new data strcture called rackToNodes at the very beginning. It can be populated by iterating through all the blocks at the very beginning. > IllegalArgumentException when CombineFileInputFormat is used as job > InputFormat > ------------------------------------------------------------------------------- > > Key: HADOOP-5759 > URL: https://issues.apache.org/jira/browse/HADOOP-5759 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Amareshwari Sriramadasu > Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > Attachments: patch-5759.txt > > > As per my understanding, CombineFileInputFormat is creating splits with > rackname as split location. > When I use CombineFileInputFormat as the InputFormat for job, job > initialization fails with following exception : > 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener > (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: > java.lang.IllegalArgumentException: Network location name contains /: > /default-rack > at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76) > at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57) > at > org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342) > at > org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336) > at > org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344) > at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441) > at > org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:619) > When I changed CombineFileInputFormat to pass just rackname (without '/'), JT > wrongly resolves the node as /default-rack/<rack-name>. > Solution is to pass hostnames holding the block(on the rack), instead of > rackname. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.