On Mar 22, 2012, at 11:45 AM, Amir Sanjar wrote: > Thanks for the reply Robert, > However I believe the main design issue is: > If there is a rack ( listed in rackToBlock hashMap) that contains all the > blocks (stored in blockToNode hashMap), regardless of the order, the split > operation terminates after the rack gets processed, That means remaining > racks ( listed in rackToBlock hashMap) will not get processed . For more > details look at file CombineFileInputFormat.JAVA, method getMoreSplits(), > while loop starting at line 344. >
I haven't looked at the code much yet. But trying to understand your question - what issue are you trying to bring out? Is it overloading one task with too much input (there is a min/max limit on that one though)? > Best Regards > Amir Sanjar > > Linux System Management Architect and Lead > IBM Senior Software Engineer > Phone# 512-286-8393 > Fax# 512-838-8858 > > > > > > From: Robert Evans <ev...@yahoo-inc.com> > To: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org> > Date: 03/22/2012 11:57 AM > Subject: Re: Question about Hadoop-8192 and rackToBlocks ordering > > > > If it really is the ordering of the hash map I would say no it should not, > and the code should be updated. If ordering matters we need to use a map > that guarantees a given order, and hash map is not one of them. > > --Bobby Evans > > On 3/22/12 7:24 AM, "Kumar Ravi" <gokumarr...@gmail.com> wrote: > > Hello, > > We have been looking at IBM JDK junit failures on Hadoop-1.0.1 > independently and have ran into the same failures as reported in this JIRA. > I have a question based upon what I have observed below. > > We started debugging the problems in the testcase - > org.apache.hadoop.mapred.lib.TestCombineFileInputFormat > The testcase fails because the number of splits returned back from > CombineFileInputFormat.getSplits() is 1 when using IBM JDK whereas the > expected return value is 2. > > So far, we have found the reason for this difference in number of splits is > because the order in which elements in the rackToBlocks hashmap get created > is in the reverse order that Sun JDK creates. > > The question I have at this point is -- Should there be a strict dependency > in the order in which the rackToBlocks hashmap gets populated, to determine > the number of splits that get should get created in a hadoop cluster? Is > this Working as designed? > > Regards, > Kumar > > >