[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197610#comment-13197610 ]
Evan Pollan commented on MAPREDUCE-3583: ---------------------------------------- Turns out, the bug is in CDH3U3's version of ProcfsBasedProcessTree. Once I updated my cluster creation automation to specifically use CDH3U2 (rather than the latest update to CDH3), the problem went away. CDH3U3's version of ProcfsBasedProcessTree is much closer to the trunk's version than CDH3U2 (as you would expect). So, it could be that more recent versions of this class have introduced incompatibilities with 64 bit Ubuntu (and possibly other distros). > ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException > ----------------------------------------------------------------------------- > > Key: MAPREDUCE-3583 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.205.0 > Environment: 64-bit Linux: > asf011.sp2.ygridcore.net > Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 > 17:42:25 UTC 2011 x86_64 GNU/Linux > Reporter: Zhihong Yu > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Attachments: mapreduce-3583.txt > > > HBase PreCommit builds frequently gave us NumberFormatException. > From > https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: > {code} > 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file > set. User classes may not be found. See JobConf(Class) or > JobConf#setJar(String). > java.lang.NumberFormatException: For input string: "18446743988060683582" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Long.parseLong(Long.java:422) > at java.lang.Long.parseLong(Long.java:468) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) > at > org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) > at org.apache.hadoop.mapred.Task.initialize(Task.java:536) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > {code} > From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, > causing NFE: > {code} > // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) > pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), > {code} > You can find information on the OS at the beginning of > https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: > {code} > asf011.sp2.ygridcore.net > Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 > 17:42:25 UTC 2011 x86_64 GNU/Linux > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 20 > file size (blocks, -f) unlimited > pending signals (-i) 16382 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 60000 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 2048 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > 60000 > Running in Jenkins mode > {code} > From Nicolas Sze: > {noformat} > It looks like that the ppid is a 64-bit positive integer but Java long is > signed and so only works with 63-bit positive integers. In your case, > 2^64 > 18446743988060683582 > 2^63. > Therefore, there is a NFE. > {noformat} > I propose changing allProcessInfo to Map<String, ProcessInfo> so that we > don't encounter this problem by avoiding parsing large integer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira