ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
-----------------------------------------------------------------------------
Key: MAPREDUCE-3583
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.20.205.0
Environment: 64-bit Linux:
asf011.sp2.ygridcore.net
Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20
17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
HBase PreCommit builds frequently gave us NumberFormatException.
>From
>https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
{code}
2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file
set. User classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
java.lang.NumberFormatException: For input string: "18446743988060683582"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:422)
at java.lang.Long.parseLong(Long.java:468)
at
org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
at
org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
at
org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
{code}
>From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582,
>causing NFE:
{code}
// Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
{code}
You can find information on the OS at the beginning of
https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
{code}
asf011.sp2.ygridcore.net
Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20
17:42:25 UTC 2011 x86_64 GNU/Linux
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 2048
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
60000
Running in Jenkins mode
{code}
>From Nicolas Sze:
{noformat}
It looks like that the ppid is a 64-bit positive integer but Java long is
signed and so only works with 63-bit positive integers. In your case,
2^64 > 18446743988060683582 > 2^63.
Therefore, there is a NFE.
{noformat}
I propose changing allProcessInfo to Map<String, ProcessInfo> so that we don't
encounter this problem by avoiding parsing large integer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira