[
https://issues.apache.org/jira/browse/SOLR-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704507#comment-13704507
]
Dawid Weiss commented on SOLR-5022:
-----------------------------------
It's been both fun and a learning experience debugging this. I have good news
and bad news:
- the good news is: it's not a memory leak,
- the bad news is: it's not a memory leak :)
h2. the debugging process
Clearly permgen is one of the most wicked JVM features - it's damn hard to
figure out what its
content really is (I didn't find a way to dump it from within a running process
without invoking
the debugging interface, which in turn starts its own threads, etc.).
The way I approached the problem (which may be useful for future reference) is
as follows:
- I wrote a short aspect that injects itself before any String.intern is called:
{code}
pointcut targetMethod(): call(String java.lang.String.intern());
before() : targetMethod()
{
final JoinPoint jp = thisJoinPoint;
System.out.println("String#intern() from: "
+ jp.getSourceLocation().getWithinType() + " => "
+ jp.getTarget());
}
{code}
- then I added a Before and After hook (executed before/after each test) that
dumped memory pools:
{code}
System.out.println("Memdump#from: "
+ this.getClass().getName() + " => ");
for (MemoryPoolMXBean bean : ManagementFactory.getMemoryPoolMXBeans()) {
MemoryUsage usage = bean.getUsage();
System.out.println(
String.format(Locale.ENGLISH,
"%20s - I:%7.1f U:%7.1f M:%7.1f",
bean.getName(),
usage.getInit() / (1024 * 1024.0d),
usage.getUsed() / (1024 * 1024.0d),
usage.getMax() / (1024 * 1024.0d)));
}
{code}
- then I ran solr test in one JVM, with the following parameters:
{code}
ant -Dtests.seed=143E6CCF7E42064B
-Dtests.leaveTemporary=true
-Dtests.jvms=1
-Dargs="-javaagent:aspectjweaver.jar -XX:+UseCompressedOops
-XX:+UseParallelGC -XX:+TraceClassLoading"
test-core
{code}
I had to modify common-build.xml to include aspectj classpath entries (and the
aspect itself) because
I couldn't get it to work by passing -cp via the args parameter (didn't look
too deeply since it's a hack).
- I again modified common-build.xml and added:
{code}
sysouts="true" jvmoutputaction="pipe,ignore"
{code}
to junit4:junit4 task's attributes so that all output is emitted to temporary
files under a build folder.
h2. the results
>From the dumped output streams we have the following weave info indicating
>which methods run String.intern:
{code}
$ grep "String.intern(" junit4-J0-20130710_122632_726.syserr
in Type 'com.ctc.wstx.util.SymbolTable'
in Type 'com.ctc.wstx.util.SymbolTable'
in Type 'com.ctc.wstx.util.InternCache'
in Type 'org.apache.solr.response.JSONWriter'
in Type 'org.apache.lucene.codecs.lucene3x.TermBuffer'
in Type 'org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum'
in Type 'org.apache.solr.common.luke.FieldFlag'
in Type 'org.apache.solr.search.DocSetPerf'
in Type 'org.joda.time.tz.ZoneInfoProvider'
in Type 'org.joda.time.tz.DateTimeZoneBuilder$PrecalculatedZone'
in Type 'org.joda.time.tz.DateTimeZoneBuilder$Recurrence'
in Type 'org.joda.time.chrono.GJLocaleSymbols'
in Type 'org.apache.solr.request.TestWriterPerf'
{code}
These indeed intern a lot of strings but they're typically the same so they
don't amount to the growth of permgen.
This in turn is very steady over the runtime of the test JVM:
{code}
$ egrep -o -e "PS Perm Gen[^%]+" junit4-J0-20130710_122632_726.sysout
PS Perm Gen - I: 20.8 U: 15.9 M: 82.0
PS Perm Gen - I: 20.8 U: 16.1 M: 82.0
PS Perm Gen - I: 20.8 U: 34.6 M: 82.0
PS Perm Gen - I: 20.8 U: 37.7 M: 82.0
PS Perm Gen - I: 20.8 U: 37.7 M: 82.0
PS Perm Gen - I: 20.8 U: 37.9 M: 82.0
PS Perm Gen - I: 20.8 U: 37.9 M: 82.0
PS Perm Gen - I: 20.8 U: 38.0 M: 82.0
...
PS Perm Gen - I: 20.8 U: 77.3 M: 82.0
PS Perm Gen - I: 20.8 U: 77.4 M: 82.0
PS Perm Gen - I: 20.8 U: 77.4 M: 82.0
PS Perm Gen - I: 20.8 U: 77.4 M: 82.0
PS Perm Gen - I: 20.8 U: 77.4 M: 82.0
{code}
I stands for "initial", U for "used", M for "maximum". So you can see that the
permgen is nearly-exhausted in this run
(it didn't OOM though). Out of curiosity I checked for class loading markers --
classes are loaded throughout the whole run,
because each test loads different fragments of the code. So even at the end of
the run you get things like:
{code}
Memdump#from: org.apache.solr.update.processor.ParsingFieldUpdateProcessorsTest
=>
Code Cache - I: 2.4 U: 27.0 M: 48.0
PS Eden Space - I: 62.9 U: 68.7 M: 167.9
PS Survivor Space - I: 10.4 U: 0.8 M: 0.8
PS Old Gen - I: 167.5 U: 97.8 M: 341.4
PS Perm Gen - I: 20.8 U: 72.7 M: 82.0
[Loaded org.joda.time.ReadWritableInstant from
file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
[Loaded org.joda.time.ReadWritableDateTime from
file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
[Loaded org.joda.time.MutableDateTime from
file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
[Loaded org.joda.time.field.AbstractReadableInstantFieldProperty from
file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
[Loaded org.joda.time.MutableDateTime$Property from
file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
[Loaded org.joda.time.chrono.GJLocaleSymbols from
file:/C:/Work/lucene-solr-svn/branch_4x/solr/core/lib/joda-time-2.2.jar]
Memdump#from: org.apache.solr.update.processor.ParsingFieldUpdateProcessorsTest
=>
{code}
It seems like the problem leading to the permgen is just the huge number of
classes being loaded under a single class loader (and these
classes cannot be unloaded because they're either cross-referenced or something
else is holding on to them).
h2. verifying the class-number hipothesis
It was interesting to answer the question: how much permgen space would it take
to load all these classes *without running tests*? I wrote
a small utility that parses the output log with class loading information:
{code}
...
[Loaded org.apache.lucene.index.DocTermOrds from
file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/]
[Loaded org.apache.lucene.search.FieldCacheImpl$DocTermOrdsCache from
file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/]
[Loaded org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache from
file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/]
[Loaded org.apache.lucene.search.FieldCache$2 from
file:/C:/Work/lucene-solr-svn/branch_4x/lucene/build/core/classes/java/]
...
{code}
and turns it into a custom URLClassLoader with the URLs that appear in those
entries. Then the tool attempts to load all the referenced classes (and run
initializers)
but does not do anything else. It also dumps the permgen state every 100
classes. The results are as follows:
{code}
# 10240 classes from 61 sources.
0 - Code Cache - I: 2.4 U: 0.6 M: 48.0
0 - PS Eden Space - I: 62.9 U: 54.1 M: 1319.1
0 - PS Survivor Space - I: 10.4 U: 2.6 M: 10.4
0 - PS Old Gen - I: 167.5 U: 0.0 M: 2680.0
0 - PS Perm Gen - I: 20.8 U: 4.1 M: 82.0
...
1400 - Code Cache - I: 2.4 U: 0.7 M: 48.0
1400 - PS Eden Space - I: 62.9 U: 18.8 M: 1319.1
1400 - PS Survivor Space - I: 10.4 U: 3.5 M: 10.4
1400 - PS Old Gen - I: 167.5 U: 0.0 M: 2680.0
1400 - PS Perm Gen - I: 20.8 U: 12.0 M: 82.0
...
6200 - Code Cache - I: 2.4 U: 1.3 M: 48.0
6200 - PS Eden Space - I: 62.9 U: 33.3 M: 1319.1
6200 - PS Survivor Space - I: 10.4 U: 10.4 M: 10.4
6200 - PS Old Gen - I: 167.5 U: 10.7 M: 2680.0
6200 - PS Perm Gen - I: 20.8 U: 45.6 M: 82.0
...
10239 - Code Cache - I: 2.4 U: 1.5 M: 48.0
10239 - PS Eden Space - I: 62.9 U: 4.8 M: 1319.1
10239 - PS Survivor Space - I: 10.4 U: 10.4 M: 10.4
10239 - PS Old Gen - I: 167.5 U: 21.7 M: 2680.0
10239 - PS Perm Gen - I: 20.8 U: 71.5 M: 82.0
{code}
which, if you forgot already, very nicely matches the result acquired from the
real test run (classes plus
interned strings):
{code}
Memdump#from: org.apache.solr.util.FileUtilsTest =>
Code Cache - I: 2.4 U: 24.3 M: 48.0
PS Eden Space - I: 62.9 U: 39.6 M: 166.5
PS Survivor Space - I: 10.4 U: 1.1 M: 2.1
PS Old Gen - I: 167.5 U: 173.9 M: 341.4
PS Perm Gen - I: 20.8 U: 77.4 M: 82.0
{code}
I repeated the above results with JDK 1.7 (64 bit) and the required permgen
space is smaller:
{code}
10239 - Code Cache - I: 2.4 U: 1.3 M: 48.0
10239 - PS Eden Space - I: 62.9 U: 97.5 M: 1319.1
10239 - PS Survivor Space - I: 10.4 U: 10.4 M: 10.4
10239 - PS Old Gen - I: 167.5 U: 27.5 M: 2680.0
10239 - PS Perm Gen - I: 20.8 U: 59.9 M: 82.0
{code}
which may be a hint why we're seing the problem only on 1.6 -- we're running
very close to the limit and 1.6
is less space-conservative.
I also ran it with jrockit (for fun):
{code}
10239 - Nursery - I: -0.0 U: 13.0 M: 2918.4
10239 - Old Space - I: 64.0 U: 62.0 M: 3072.0
10239 - Class Memory - I: 0.5 U: 68.7 M: -0.0
10239 - ClassBlock Memory - I: 0.5 U: 4.0 M: -0.0
{code}
and with J9:
{code}
10239 - class storage - I: 0.0 U: 41.3 M: -0.0
10239 - JIT code cache - I: 0.0 U: 8.0 M: -0.0
10239 - JIT data cache - I: 0.0 U: 0.3 M: -0.0
10239 - miscellaneous non-heap storage - I: 0.0 U: 0.0 M: -0.0
10239 - Java heap - I: 4.0 U: 38.3 M: 512.0
{code}
h2. conclusions
So it's the number of classes that is the core of the problem. The workarounds
in the order of difficulty:
- increase max permgen for hotspot (other JVMs should be able to do it
dynamically),
- split solr core tests into multiple ant sub-calls so that they don't run in a
single JVM,
- change the runner to support running tests in isolation (for example max-N
tests per JVM, then relaunch)
- probably a lot more options here, depending on your current creativity levels
:)
> PermGen exhausted test failures on Jenkins.
> -------------------------------------------
>
> Key: SOLR-5022
> URL: https://issues.apache.org/jira/browse/SOLR-5022
> Project: Solr
> Issue Type: Test
> Components: Tests
> Reporter: Mark Miller
> Fix For: 5.0, 4.4
>
> Attachments: intern-count-win.txt, SOLR-5022.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]