[ 
https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712524#action_12712524
 ] 

dhruba borthakur commented on HADOOP-5759:
------------------------------------------

This patch keeps the original aim of combining blocks from different hosts in 
the same rack into a single split. The fix that that patch attempts is to 
figure out where such a combined split should reside.


> Since only hosts that actually contain the valid blocks are returned in 
> getMoreSplits with this patch,

I agree with Jothi to a certain extent. The number ofsplits remain the same 
before, but the possibility of scheduling them on the rack where they reside is 
slightly reduced because we look only at those hosts where this block belongs. 
It is possible to enhance this patch to create  a new data strcture called 
rackToNodes at the very beginning. It can be populated by iterating through all 
the blocks at the very beginning. 

> IllegalArgumentException when CombineFileInputFormat is used as job 
> InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with 
> rackname as split location. 
> When I use CombineFileInputFormat as the InputFormat for job, job 
> initialization fails with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener 
> (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: 
> /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at 
> org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at 
> org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at 
> org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at 
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT 
> wrongly resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of 
> rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to