Thanks for the analysis, Nicolas. Is it reasonable to change allProcessInfo to Map<String, ProcessInfo> so that we don't encounter this problem by avoiding parsing large integer ?
On Mon, Dec 19, 2011 at 9:59 PM, Tsz Wo Sze <[email protected]> wrote: > Hi, > > It looks like that the ppid is a 64-bit positive integer but Java long is > signed and so only works with 63-bit positive integers. In your case, > > 2^64 > 18446743988060683582 > 2^63. > > Therefore, there is a NFE. I think it is a bug in ProcfsBasedProcessTree. > > > Regards, > > Nicholas Sze > > > > ________________________________ > From: Ted Yu <[email protected]> > To: [email protected] > Cc: giridharan kesavan <[email protected]> > Sent: Monday, December 19, 2011 8:24 PM > Subject: mysterious NumberFormatException > > Hi, > HBase PreCommit builds frequently gave us mysterious NumberFormatException > > From > > https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/ > : > > 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar > file set. User classes may not be found. See JobConf(Class) or > JobConf#setJar(String). > java.lang.NumberFormatException: For input string: "18446743988060683582" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Long.parseLong(Long.java:422) > at java.lang.Long.parseLong(Long.java:468) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) > at > org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) > at org.apache.hadoop.mapred.Task.initialize(Task.java:536) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, > causing NFE: > // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) > (rss) > pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), > > You can find information on the OS at the beginning of > https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: > > asf011.sp2.ygridcore.net > Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul > 20 17:42:25 UTC 2011 x86_64 GNU/Linux > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 20 > file size (blocks, -f) unlimited > pending signals (-i) 16382 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 60000 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 2048 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > 60000 > Running in Jenkins mode > > Your insight is welcome. >
