On Mar 22, 2012, at 11:45 AM, Amir Sanjar wrote:

> Thanks for the reply Robert,
> However I believe the main design issue is:
> If there is a rack ( listed in rackToBlock hashMap) that contains all the
> blocks (stored in blockToNode hashMap), regardless of the order, the split
> operation terminates after the rack gets processed,  That means remaining
> racks  ( listed in rackToBlock hashMap)  will not get processed . For more
> details look at file CombineFileInputFormat.JAVA, method getMoreSplits(),
> while loop starting at  line 344.
> 

I haven't looked at the code much yet. But trying to understand your question - 
what issue are you trying to bring out? Is it overloading one task with too 
much input (there is a min/max limit on that one though)?

> Best Regards
> Amir Sanjar
> 
> Linux System Management Architect and Lead
> IBM Senior Software Engineer
> Phone# 512-286-8393
> Fax#      512-838-8858
> 
> 
> 
> 
> 
> From: Robert Evans <ev...@yahoo-inc.com>
> To:   "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> Date: 03/22/2012 11:57 AM
> Subject:      Re: Question about Hadoop-8192 and rackToBlocks ordering
> 
> 
> 
> If it really is the ordering of the hash map I would say no it should not,
> and the code should be updated.  If ordering matters we need to use a map
> that guarantees a given order, and hash map is not one of them.
> 
> --Bobby Evans
> 
> On 3/22/12 7:24 AM, "Kumar Ravi" <gokumarr...@gmail.com> wrote:
> 
> Hello,
> 
> We have been looking at IBM JDK junit failures on Hadoop-1.0.1
> independently and have ran into the same failures as reported in this JIRA.
> I have a question based upon what I have observed below.
> 
> We started debugging the problems in the testcase -
> org.apache.hadoop.mapred.lib.TestCombineFileInputFormat
> The testcase fails because the number of splits returned back from
> CombineFileInputFormat.getSplits() is 1 when using IBM JDK whereas the
> expected return value is 2.
> 
> So far, we have found the reason for this difference in number of splits is
> because the order in which elements in the rackToBlocks hashmap get created
> is in the reverse order that Sun JDK creates.
> 
> The question I have at this point is -- Should there be a strict dependency
> in the order in which the rackToBlocks hashmap gets populated, to determine
> the number of splits that get should get created in a hadoop cluster? Is
> this Working as designed?
> 
> Regards,
> Kumar
> 
> 
> 

Reply via email to