[ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712683#action_12712683 ]
Jothi Padmanabhan commented on HADOOP-5759: ------------------------------------------- bq. It is possible to enhance this patch to create a new data strcture called rackToNodes at the very beginning. It can be populated by iterating through all the blocks at the very beginning. Agreed. However, Amareshwari and I discussed this offline and we are not sure if we want to build this rackToNodes before getMoreSplits method as it would mean calling getBlockLocations for all the blocks twice -- once to build the rackToNodes and once in getMoreSplits to build the blockInfo maps. Could we incrementally build this map in getMoreSplits. This would have the disadvantage of having incomplete information for the first few splits. > IllegalArgumentException when CombineFileInputFormat is used as job > InputFormat > ------------------------------------------------------------------------------- > > Key: HADOOP-5759 > URL: https://issues.apache.org/jira/browse/HADOOP-5759 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Amareshwari Sriramadasu > Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > Attachments: patch-5759.txt > > > As per my understanding, CombineFileInputFormat is creating splits with > rackname as split location. > When I use CombineFileInputFormat as the InputFormat for job, job > initialization fails with following exception : > 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener > (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: > java.lang.IllegalArgumentException: Network location name contains /: > /default-rack > at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76) > at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57) > at > org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342) > at > org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336) > at > org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344) > at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441) > at > org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:619) > When I changed CombineFileInputFormat to pass just rackname (without '/'), JT > wrongly resolves the node as /default-rack/<rack-name>. > Solution is to pass hostnames holding the block(on the rack), instead of > rackname. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.