It certainly looks possible -can you file a JIRA issue on the problem? On 13 January 2013 16:39, Ted Yu <[email protected]> wrote:
> I found this error again, see > > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/ > > 2013-01-12 11:53:52,809 WARN [AsyncDispatcher event handler] > resourcemanager.RMAuditLogger(255): USER=jenkins > OPERATION=Application > Finished - Failed TARGET=RMAppManager RESULT=FAILURE > DESCRIPTION=App > failed with state: FAILED PERMISSIONS=Application > application_1357991604658_0002 failed 1 times due to AM Container for > appattempt_1357991604658_0002_000001 exited with exitCode: -1000 due > to: java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851) > > .Failing this attempt.. Failing the > application. APPID=application_1357991604658_0002 > Here is related code: > > // Keep rolling the wheel till we get a valid path > Random r = new java.util.Random(); > while (numDirsSearched < numDirs && returnPath == null) { > long randomPosition = Math.abs(r.nextLong()) % totalAvailable; > > My guess is that totalAvailable was 0, meaning dirDF was empty. > > Please advise whether that scenario is possible. > > Cheers > > On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <[email protected]> wrote: > > > Thanks for the investigation Kihwal. > > > > I will keep an eye on future test failure in TestRowCounter. > > > > > > On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <[email protected]> > wrote: > > > >> Ted, > >> > >> I couldn't reproduce it by just running the test case. When you > reproduce > >> it, look at the stderr/stdout file somewhere under > >> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under > the > >> directory whose name containing the app id. > >> > >> I did run into a similar problem and the stderr said: > >> /bin/bash: /bin/java: No such file or directory > >> > >> It was because JAVA_HOME was not set. But in this case the exit code was > >> 127 (shell not being able to locate the command to exec). In the hudson > >> job, the exit code was 1, so I think it's something else. > >> > >> Kihwal > >> > >> On 10/29/12 11:56 PM, "Ted Yu" <[email protected]> wrote: > >> > >> >TestRowCounter still fails: > >> > > >> > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j > >> > >> > >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu > >> >mn/ > >> > > >> >but there was no 'divide by zero' exception. > >> > > >> >Cheers > >> > > >> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[email protected]> wrote: > >> > > >> >> I will try 2.0.2-alpha release. > >> >> > >> >> Cheers > >> >> > >> >> > >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[email protected]> wrote: > >> >> > >> >>> Thanks for the quick response, Robert. > >> >>> Here is the hadoop version being used: > >> >>> <hadoop-two.version>2.0.1-alpha</hadoop-two.version> > >> >>> > >> >>> If there is newer release, I am willing to try that before filing > >> JIRA. > >> >>> > >> >>> > >> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans > >> >>><[email protected]>wrote: > >> >>> > >> >>>> It looks like you are running with an older version of 2.0, even > >> >>>>though > >> >>>> it > >> >>>> does not really make much of a difference in this case, The issue > >> >>>>shows > >> >>>> up when getLocalPathForWrite thinks there is no space on to write > to > >> >>>>on > >> >>>> any of the disks it has configured. This could be because you do > not > >> >>>> have > >> >>>> any directories configured. I really don't know for sure exactly > >> >>>>what is > >> >>>> happening. It might be disk fail in place removing disks for you > >> >>>>because > >> >>>> of other issues. Either way we should file a JIRA against Hadoop to > >> >>>>make > >> >>>> it so we never get the / by zero error and provide a better way to > >> >>>>handle > >> >>>> the possible causes. > >> >>>> > >> >>>> --Bobby Evans > >> >>>> > >> >>>> On 10/24/12 11:54 PM, "Ted Yu" <[email protected]> wrote: > >> >>>> > >> >>>> >Hi, > >> >>>> >HBase has Jenkins build against hadoop 2.0 > >> >>>> >I was checking why TestRowCounter sometimes failed: > >> >>>> > > >> >>>> > >> >>>> > >> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testRepor > >> >>>>t/o > >> >>>> > >> >>>> > >> > >> > >>>>>rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiv > >> >>>>>eCol > >> >>>> >umn/ > >> >>>> > > >> >>>> >I think the following could be the cause: > >> >>>> > > >> >>>> >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] > >> >>>> >resourcemanager.RMAuditLogger(255): USER=jenkins > >> >>>> OPERATION=Application > >> >>>> >Finished - Failed TARGET=RMAppManager RESULT=FAILURE > >> >>>> DESCRIPTION=App > >> >>>> >failed with state: FAILED PERMISSIONS=Application > >> >>>> >application_1350949562159_0002 failed 1 times due to AM Container > >> for > >> >>>> >appattempt_1350949562159_0002_000001 exited with exitCode: -1000 > >> due > >> >>>> >to: java.lang.ArithmeticException: / by zero > >> >>>> > at > >> >>>> > >> >>>> > >> > >> > >>>>>org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat > >> >>>>>hFor > >> >>>> >Write(LocalDirAllocator.java:355) > >> >>>> > at > >> >>>> > >> >>>> > >> > >> > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > >> >>>>>loca > >> >>>> >tor.java:150) > >> >>>> > at > >> >>>> > >> >>>> > >> > >> > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > >> >>>>>loca > >> >>>> >tor.java:131) > >> >>>> > at > >> >>>> > >> >>>> > >> > >> > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > >> >>>>>loca > >> >>>> >tor.java:115) > >> >>>> > at > >> >>>> > >> >>>> > >> > >> > >>>>>org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getL > >> >>>>>ocal > >> >>>> >PathForWrite(LocalDirsHandlerService.java:257) > >> >>>> > at > >> >>>> > >> >>>> > >> > >> > >>>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.R > >> >>>>>esou > >> >>>> > >> >>>> > >> > >> > >>>>>rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService > >> >>>>>.jav > >> >>>> >a:849) > >> >>>> > > >> >>>> >However, I don't seem to find where in getLocalPathForWrite() > >> >>>>division > >> >>>> by > >> >>>> >zero could have arisen. > >> >>>> > > >> >>>> >Comment / hint is welcome. > >> >>>> > > >> >>>> >Thanks > >> >>>> > >> >>>> > >> >>> > >> >> > >> > >> > > >
