[ 
https://issues.apache.org/jira/browse/HADOOP-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210928#comment-15210928
 ] 

Jason Lowe commented on HADOOP-12958:
-------------------------------------

Thanks for chiming in, [~sjlee0].  I ran the job again with the requested JVM 
options, and here's the last few lines in the GC log.  The task ran with JDK 
1.8.0_60-b27 on RHEL 6.5:
{noformat}
2016-03-24T20:10:36.476+0000: 107.002: [GC (Allocation Failure) 
2016-03-24T20:10:36.481+0000: 107.007: [SoftReference, 0 refs, 0.0000264 
secs]2016-03-24T20:10:36.481+0000: 107.007: [WeakReference, 0 refs, 0.0000130 
secs]2016-03-24T20:10:36.481+0000: 107.007: [FinalReference, 0 refs, 0.0000127 
secs]2016-03-24T20:10:36.481+0000: 107.007: [PhantomReference, 0 refs, 0 refs, 
0.0000140 secs]2016-03-24T20:10:36.481+0000: 107.007: [JNI Weak Reference, 
0.0000170 secs][PSYoungGen: 115296K->96K(115712K)] 603566K->488366K(734720K), 
0.0061637 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
2016-03-24T20:10:36.960+0000: 107.486: [GC (Allocation Failure) 
2016-03-24T20:10:36.965+0000: 107.491: [SoftReference, 0 refs, 0.0000472 
secs]2016-03-24T20:10:36.965+0000: 107.491: [WeakReference, 0 refs, 0.0000363 
secs]2016-03-24T20:10:36.965+0000: 107.491: [FinalReference, 0 refs, 0.0000108 
secs]2016-03-24T20:10:36.965+0000: 107.491: [PhantomReference, 0 refs, 0 refs, 
0.0000083 secs]2016-03-24T20:10:36.965+0000: 107.491: [JNI Weak Reference, 
0.0000091 secs][PSYoungGen: 115296K->96K(115712K)] 603566K->488366K(734720K), 
0.0056482 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2016-03-24T20:10:38.447+0000: 108.973: [GC (Allocation Failure) 
2016-03-24T20:10:38.452+0000: 108.978: [SoftReference, 0 refs, 0.0000338 
secs]2016-03-24T20:10:38.452+0000: 108.978: [WeakReference, 0 refs, 0.0000123 
secs]2016-03-24T20:10:38.452+0000: 108.978: [FinalReference, 3 refs, 0.0000158 
secs]2016-03-24T20:10:38.452+0000: 108.978: [PhantomReference, 0 refs, 0 refs, 
0.0000113 secs]2016-03-24T20:10:38.452+0000: 108.978: [JNI Weak Reference, 
0.0000148 secs][PSYoungGen: 115296K->256K(115712K)] 603566K->488526K(734720K), 
0.0058352 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
2016-03-24T20:10:40.917+0000: 111.443: [GC (Allocation Failure) 
2016-03-24T20:10:40.922+0000: 111.448: [SoftReference, 0 refs, 0.0000209 
secs]2016-03-24T20:10:40.922+0000: 111.448: [WeakReference, 159 refs, 0.0000212 
secs]2016-03-24T20:10:40.922+0000: 111.448: [FinalReference, 269 refs, 
0.0003309 secs]2016-03-24T20:10:40.923+0000: 111.449: [PhantomReference, 0 
refs, 6 refs, 0.0000108 secs]2016-03-24T20:10:40.923+0000: 111.449: [JNI Weak 
Reference, 0.0000108 secs][PSYoungGen: 115456K->480K(115712K)] 
603726K->489305K(734720K), 0.0061493 secs] [Times: user=0.02 sys=0.00, 
real=0.01 secs] 
2016-03-24T20:10:42.797+0000: 113.323: [GC (Allocation Failure) 
2016-03-24T20:10:42.802+0000: 113.328: [SoftReference, 0 refs, 0.0000230 
secs]2016-03-24T20:10:42.802+0000: 113.328: [WeakReference, 231 refs, 0.0000225 
secs]2016-03-24T20:10:42.802+0000: 113.328: [FinalReference, 350 refs, 
0.0004041 secs]2016-03-24T20:10:42.803+0000: 113.329: [PhantomReference, 0 
refs, 6 refs, 0.0000097 secs]2016-03-24T20:10:42.803+0000: 113.329: [JNI Weak 
Reference, 0.0000106 secs][PSYoungGen: 115680K->480K(114688K)] 
604505K->489537K(733696K), 0.0062425 secs] [Times: user=0.02 sys=0.00, 
real=0.00 secs] 
2016-03-24T20:10:44.626+0000: 115.151: [GC (Allocation Failure) 
2016-03-24T20:10:44.630+0000: 115.156: [SoftReference, 0 refs, 0.0000247 
secs]2016-03-24T20:10:44.630+0000: 115.156: [WeakReference, 230 refs, 0.0000234 
secs]2016-03-24T20:10:44.630+0000: 115.156: [FinalReference, 350 refs, 
0.0003741 secs]2016-03-24T20:10:44.631+0000: 115.157: [PhantomReference, 0 
refs, 6 refs, 0.0000095 secs]2016-03-24T20:10:44.631+0000: 115.157: [JNI Weak 
Reference, 0.0000104 secs][PSYoungGen: 114656K->896K(115200K)] 
603713K->489961K(734208K), 0.0059075 secs] [Times: user=0.02 sys=0.00, 
real=0.00 secs] 
2016-03-24T20:10:46.449+0000: 116.975: [GC (Allocation Failure) 
2016-03-24T20:10:46.455+0000: 116.981: [SoftReference, 0 refs, 0.0000285 
secs]2016-03-24T20:10:46.455+0000: 116.981: [WeakReference, 235 refs, 0.0000293 
secs]2016-03-24T20:10:46.455+0000: 116.981: [FinalReference, 356 refs, 
0.0004249 secs]2016-03-24T20:10:46.455+0000: 116.981: [PhantomReference, 0 
refs, 7 refs, 0.0000106 secs]2016-03-24T20:10:46.455+0000: 116.981: [JNI Weak 
Reference, 0.0000105 secs][PSYoungGen: 115072K->864K(114176K)] 
604137K->489929K(733184K), 0.0069392 secs] [Times: user=0.02 sys=0.00, 
real=0.01 secs] 
2016-03-24T20:10:48.741+0000: 119.267: [GC (Allocation Failure) 
2016-03-24T20:10:48.749+0000: 119.275: [SoftReference, 0 refs, 0.0000592 
secs]2016-03-24T20:10:48.749+0000: 119.275: [WeakReference, 639 refs, 0.0000830 
secs]2016-03-24T20:10:48.749+0000: 119.275: [FinalReference, 177 refs, 
0.0002006 secs]2016-03-24T20:10:48.749+0000: 119.275: [PhantomReference, 0 
refs, 12 refs, 0.0000099 secs]2016-03-24T20:10:48.749+0000: 119.275: [JNI Weak 
Reference, 0.0000109 secs][PSYoungGen: 114016K->1514K(114688K)] 
603081K->497627K(733696K), 0.0088279 secs] [Times: user=0.04 sys=0.00, 
real=0.01 secs] 
2016-03-24T20:10:48.837+0000: 119.362: [GC (Allocation Failure) 
2016-03-24T20:10:48.844+0000: 119.369: [SoftReference, 0 refs, 0.0000405 
secs]2016-03-24T20:10:48.844+0000: 119.370: [WeakReference, 281 refs, 0.0000290 
secs]2016-03-24T20:10:48.844+0000: 119.370: [FinalReference, 3 refs, 0.0000159 
secs]2016-03-24T20:10:48.844+0000: 119.370: [PhantomReference, 0 refs, 1 refs, 
0.0000076 secs]2016-03-24T20:10:48.844+0000: 119.370: [JNI Weak Reference, 
0.0000102 secs][PSYoungGen: 10455K->1530K(109568K)] 506569K->497707K(1041920K), 
0.0077403 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 
2016-03-24T20:10:48.844+0000: 119.370: [GC (Allocation Failure) 
2016-03-24T20:10:48.849+0000: 119.375: [SoftReference, 0 refs, 0.0000377 
secs]2016-03-24T20:10:48.849+0000: 119.375: [WeakReference, 279 refs, 0.0000288 
secs]2016-03-24T20:10:48.849+0000: 119.375: [FinalReference, 0 refs, 0.0000209 
secs]2016-03-24T20:10:48.849+0000: 119.375: [PhantomReference, 0 refs, 1 refs, 
0.0000113 secs]2016-03-24T20:10:48.849+0000: 119.375: [JNI Weak Reference, 
0.0000132 secs][PSYoungGen: 1530K->1562K(112128K)] 497707K->497739K(1044480K), 
0.0056076 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2016-03-24T20:10:48.850+0000: 119.376: [Full GC (Allocation Failure) 
2016-03-24T20:10:48.857+0000: 119.383: [SoftReference, 0 refs, 0.0000476 
secs]2016-03-24T20:10:48.857+0000: 119.383: [WeakReference, 1345 refs, 
0.0000814 secs]2016-03-24T20:10:48.857+0000: 119.383: [FinalReference, 1006 
refs, 0.0002261 secs]2016-03-24T20:10:48.857+0000: 119.383: [PhantomReference, 
2 refs, 20 refs, 0.0000691 secs]2016-03-24T20:10:48.858+0000: 119.383: [JNI 
Weak Reference, 0.0000115 secs][PSYoungGen: 1562K->0K(112128K)] [ParOldGen: 
496177K->490120K(627200K)] 497739K->490120K(739328K), [Metaspace: 
29185K->29185K(1075200K)], 0.1824223 secs] [Times: user=0.64 sys=0.00, 
real=0.18 secs] 
2016-03-24T20:10:49.033+0000: 119.559: [GC (Allocation Failure) 
2016-03-24T20:10:49.038+0000: 119.564: [SoftReference, 0 refs, 0.0000332 
secs]2016-03-24T20:10:49.038+0000: 119.564: [WeakReference, 0 refs, 0.0000123 
secs]2016-03-24T20:10:49.038+0000: 119.564: [FinalReference, 0 refs, 0.0000103 
secs]2016-03-24T20:10:49.038+0000: 119.564: [PhantomReference, 0 refs, 0 refs, 
0.0000110 secs]2016-03-24T20:10:49.038+0000: 119.564: [JNI Weak Reference, 
0.0000152 secs][PSYoungGen: 0K->0K(112128K)] 490120K->490120K(1044480K), 
0.0057650 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2016-03-24T20:10:49.039+0000: 119.565: [Full GC (Allocation Failure) 
2016-03-24T20:10:49.046+0000: 119.572: [SoftReference, 591 refs, 0.0001611 
secs]2016-03-24T20:10:49.046+0000: 119.572: [WeakReference, 1218 refs, 
0.0000570 secs]2016-03-24T20:10:49.046+0000: 119.572: [FinalReference, 691 
refs, 0.0000371 secs]2016-03-24T20:10:49.046+0000: 119.572: [PhantomReference, 
0 refs, 15 refs, 0.0000083 secs]2016-03-24T20:10:49.046+0000: 119.572: [JNI 
Weak Reference, 0.0000102 secs][PSYoungGen: 0K->0K(112128K)] [ParOldGen: 
490120K->489358K(651776K)] 490120K->489358K(763904K), [Metaspace: 
29185K->29022K(1075200K)], 0.1861130 secs] [Times: user=0.64 sys=0.01, 
real=0.18 secs] 
Heap
 PSYoungGen      total 112128K, used 8888K [0x00000000f8e80000, 
0x0000000100000000, 0x0000000100000000)
  eden space 108032K, 8% used 
[0x00000000f8e80000,0x00000000f972e390,0x00000000ff800000)
  from space 4096K, 0% used 
[0x00000000ffc00000,0x00000000ffc00000,0x0000000100000000)
  to   space 4096K, 0% used 
[0x00000000ff800000,0x00000000ff800000,0x00000000ffc00000)
 ParOldGen       total 932352K, used 489358K [0x00000000c0000000, 
0x00000000f8e80000, 0x00000000f8e80000)
  object space 932352K, 52% used 
[0x00000000c0000000,0x00000000ddde3a80,0x00000000f8e80000)
 Metaspace       used 29118K, capacity 29346K, committed 29872K, reserved 
1075200K
  class space    used 3320K, capacity 3374K, committed 3504K, reserved 1048576K
{noformat}

[~cmccabe]'s suggestion would probably fix the issue, since it hopefully 
triggers a GC cycle before the big buffer comes along to force one.  Then we 
won't be in the middle of trying to satisfy a huge allocation when we notice 
the phantom references are available for collection for the first time and 
still need to do app-level processing to free it.  My main concern with relying 
on that approach is tracking down other frameworks or apps that will need to do 
similar things, since these kinds of jobs ran fine before.

> PhantomReference for filesystem statistics can trigger OOM
> ----------------------------------------------------------
>
>                 Key: HADOOP-12958
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12958
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.3, 2.6.4
>            Reporter: Jason Lowe
>             Fix For: 2.7.3, 2.6.5
>
>
> I saw an OOM that appears to have been caused by the phantom references 
> introduced for file system statistics management.  I'll post details in a 
> followup comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to