MAPREDUCE-4940 has been logged. Thanks
On Mon, Jan 14, 2013 at 4:34 AM, Steve Loughran <[email protected]>wrote: > It certainly looks possible -can you file a JIRA issue on the problem? > > On 13 January 2013 16:39, Ted Yu <[email protected]> wrote: > > > I found this error again, see > > > > > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/ > > > > 2013-01-12 11:53:52,809 WARN [AsyncDispatcher event handler] > > resourcemanager.RMAuditLogger(255): USER=jenkins > > OPERATION=Application > > Finished - Failed TARGET=RMAppManager RESULT=FAILURE > > DESCRIPTION=App > > failed with state: FAILED PERMISSIONS=Application > > application_1357991604658_0002 failed 1 times due to AM Container for > > appattempt_1357991604658_0002_000001 exited with exitCode: -1000 due > > to: java.lang.ArithmeticException: / by zero > > at > > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368) > > at > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) > > at > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) > > at > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279) > > at > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851) > > > > .Failing this attempt.. Failing the > > application. APPID=application_1357991604658_0002 > > Here is related code: > > > > // Keep rolling the wheel till we get a valid path > > Random r = new java.util.Random(); > > while (numDirsSearched < numDirs && returnPath == null) { > > long randomPosition = Math.abs(r.nextLong()) % totalAvailable; > > > > My guess is that totalAvailable was 0, meaning dirDF was empty. > > > > Please advise whether that scenario is possible. > > > > Cheers > > > > On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <[email protected]> wrote: > > > > > Thanks for the investigation Kihwal. > > > > > > I will keep an eye on future test failure in TestRowCounter. > > > > > > > > > On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <[email protected]> > > wrote: > > > > > >> Ted, > > >> > > >> I couldn't reproduce it by just running the test case. When you > > reproduce > > >> it, look at the stderr/stdout file somewhere under > > >> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under > > the > > >> directory whose name containing the app id. > > >> > > >> I did run into a similar problem and the stderr said: > > >> /bin/bash: /bin/java: No such file or directory > > >> > > >> It was because JAVA_HOME was not set. But in this case the exit code > was > > >> 127 (shell not being able to locate the command to exec). In the > hudson > > >> job, the exit code was 1, so I think it's something else. > > >> > > >> Kihwal > > >> > > >> On 10/29/12 11:56 PM, "Ted Yu" <[email protected]> wrote: > > >> > > >> >TestRowCounter still fails: > > >> > > > >> > > > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j > > >> > > >> > > > >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu > > >> >mn/ > > >> > > > >> >but there was no 'divide by zero' exception. > > >> > > > >> >Cheers > > >> > > > >> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[email protected]> wrote: > > >> > > > >> >> I will try 2.0.2-alpha release. > > >> >> > > >> >> Cheers > > >> >> > > >> >> > > >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[email protected]> > wrote: > > >> >> > > >> >>> Thanks for the quick response, Robert. > > >> >>> Here is the hadoop version being used: > > >> >>> <hadoop-two.version>2.0.1-alpha</hadoop-two.version> > > >> >>> > > >> >>> If there is newer release, I am willing to try that before filing > > >> JIRA. > > >> >>> > > >> >>> > > >> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans > > >> >>><[email protected]>wrote: > > >> >>> > > >> >>>> It looks like you are running with an older version of 2.0, even > > >> >>>>though > > >> >>>> it > > >> >>>> does not really make much of a difference in this case, The > issue > > >> >>>>shows > > >> >>>> up when getLocalPathForWrite thinks there is no space on to write > > to > > >> >>>>on > > >> >>>> any of the disks it has configured. This could be because you do > > not > > >> >>>> have > > >> >>>> any directories configured. I really don't know for sure exactly > > >> >>>>what is > > >> >>>> happening. It might be disk fail in place removing disks for you > > >> >>>>because > > >> >>>> of other issues. Either way we should file a JIRA against Hadoop > to > > >> >>>>make > > >> >>>> it so we never get the / by zero error and provide a better way > to > > >> >>>>handle > > >> >>>> the possible causes. > > >> >>>> > > >> >>>> --Bobby Evans > > >> >>>> > > >> >>>> On 10/24/12 11:54 PM, "Ted Yu" <[email protected]> wrote: > > >> >>>> > > >> >>>> >Hi, > > >> >>>> >HBase has Jenkins build against hadoop 2.0 > > >> >>>> >I was checking why TestRowCounter sometimes failed: > > >> >>>> > > > >> >>>> > > >> >>>> > > >> > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testRepor > > >> >>>>t/o > > >> >>>> > > >> >>>> > > >> > > >> > > > >>>>>rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiv > > >> >>>>>eCol > > >> >>>> >umn/ > > >> >>>> > > > >> >>>> >I think the following could be the cause: > > >> >>>> > > > >> >>>> >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] > > >> >>>> >resourcemanager.RMAuditLogger(255): USER=jenkins > > >> >>>> OPERATION=Application > > >> >>>> >Finished - Failed TARGET=RMAppManager RESULT=FAILURE > > >> >>>> DESCRIPTION=App > > >> >>>> >failed with state: FAILED PERMISSIONS=Application > > >> >>>> >application_1350949562159_0002 failed 1 times due to AM > Container > > >> for > > >> >>>> >appattempt_1350949562159_0002_000001 exited with exitCode: > -1000 > > >> due > > >> >>>> >to: java.lang.ArithmeticException: / by zero > > >> >>>> > at > > >> >>>> > > >> >>>> > > >> > > >> > > > >>>>>org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat > > >> >>>>>hFor > > >> >>>> >Write(LocalDirAllocator.java:355) > > >> >>>> > at > > >> >>>> > > >> >>>> > > >> > > >> > > > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > > >> >>>>>loca > > >> >>>> >tor.java:150) > > >> >>>> > at > > >> >>>> > > >> >>>> > > >> > > >> > > > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > > >> >>>>>loca > > >> >>>> >tor.java:131) > > >> >>>> > at > > >> >>>> > > >> >>>> > > >> > > >> > > > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > > >> >>>>>loca > > >> >>>> >tor.java:115) > > >> >>>> > at > > >> >>>> > > >> >>>> > > >> > > >> > > > >>>>>org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getL > > >> >>>>>ocal > > >> >>>> >PathForWrite(LocalDirsHandlerService.java:257) > > >> >>>> > at > > >> >>>> > > >> >>>> > > >> > > >> > > > >>>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.R > > >> >>>>>esou > > >> >>>> > > >> >>>> > > >> > > >> > > > >>>>>rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService > > >> >>>>>.jav > > >> >>>> >a:849) > > >> >>>> > > > >> >>>> >However, I don't seem to find where in getLocalPathForWrite() > > >> >>>>division > > >> >>>> by > > >> >>>> >zero could have arisen. > > >> >>>> > > > >> >>>> >Comment / hint is welcome. > > >> >>>> > > > >> >>>> >Thanks > > >> >>>> > > >> >>>> > > >> >>> > > >> >> > > >> > > >> > > > > > >
