[
https://issues.apache.org/jira/browse/LUCENE-7963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158588#comment-16158588
]
Uwe Schindler edited comment on LUCENE-7963 at 9/8/17 12:59 PM:
----------------------------------------------------------------
bq. If it writes to memory it's not dead code?
The write here is not the write to the termAtt local variable. This one is
already removed (it ignores the return value). The write comes from something
inside getAttribute, as it needs to cast the attribute so it does a lot of
work. In fact it is also not dead code - although the return value is ignroed,
because the HashMap#get or the Class#cast may have side effects (like throwing
exception,...). It might have no side effects, but that's hard to figure out by
the optimizer. As far as I remeber: Hotspot was never able to remove hashmap
gets easily!
was (Author: thetaphi):
bq. If it writes to memory it's not dead code?
The write here is not the write to the termAtt local variable. This one is
already removed (it ignores the return value). The write comes from something
inside getAttribute, as it needs to cast the attribute so it does a lot of
work. In fact it is also not dead code - although the return value is ignroed,
because the HashMap#get may have side effects (like throwing exception,...). It
might have no side effects, but that's hard to figure out by the optimizer. As
far as I remeber: Hotspot was never able to remove hashmap gets easily!
> Useless getAttribute() in DefaultIndexingChain causes performance drop
> ----------------------------------------------------------------------
>
> Key: LUCENE-7963
> URL: https://issues.apache.org/jira/browse/LUCENE-7963
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 7.0
> Environment: *Hardware*
> i3.16xlarge (Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 64 cores, two sockets)
> *Software*
> * Ubuntu 16.04 (kernel 4.4.0-1022-aws)
> * Java 1.8.0_144
> Reporter: Daniel Mitterdorfer
> Assignee: Uwe Schindler
> Priority: Blocker
> Fix For: 7.0
>
> Attachments: 0001-Remove-unused-variable-in-DefaultIndexingChain.patch
>
>
> Some background: We have spotted a significant drop in indexing throughput on
> machines with a large number of cores (see
> https://github.com/elastic/elasticsearch/issues/26339 for details).
> We finally managed to isolate the behavior in a JMH "microbenchmark". Here is
> the output of running that benchmark with 32 threads with JMH's perfasm
> profiler (also Oracle Development Studio finds the same hotspot):
> {code}
> ....[Hottest Region
> 1]..............................................................................
> C2, level 4, org.apache.lucene.index.DefaultIndexingChain$PerField::invert,
> version 1231 (792 bytes)
> 0x00007f52d56b026a: xor %r10d,%r10d
> ;*invokevirtual isAssignableFrom
> ; -
> org.apache.lucene.util.AttributeSource::addAttribute@28 (line 207)
> ; -
> org.apache.lucene.document.Field$BinaryTokenStream::<init>@8 (line 512)
> ; -
> org.apache.lucene.document.Field::tokenStream@82 (line 491)
> ; -
> org.apache.lucene.index.DefaultIndexingChain$PerField::invert@99 (line 729)
> 0x00007f52d56b026d: mov $0x8,%r11d
> ;*invokeinterface iterator
> ; -
> org.apache.lucene.util.AttributeSource::getCurrentState@46 (line 254)
> ; -
> org.apache.lucene.util.AttributeSource::clearAttributes@1 (line 269)
> ; -
> org.apache.lucene.document.Field$BinaryTokenStream::incrementToken@10 (line
> 532)
> ; -
> org.apache.lucene.index.DefaultIndexingChain$PerField::invert@153 (line 736)
> 0.00% 0.02% ↗ 0x00007f52d56b0273: test %r10,%r10
> ╭ │ 0x00007f52d56b0276: je 0x00007f52d56b0292
> ;*getfield fieldsData
> │ │ ; -
> org.apache.lucene.document.Field::binaryValue@1 (line 441)
> │ │ ; -
> org.apache.lucene.document.Field::tokenStream@65 (line 487)
> │ │ ; -
> org.apache.lucene.index.DefaultIndexingChain$PerField::invert@99 (line 729)
> 0.00% 0.00% │ │ 0x00007f52d56b0278: mov (%r11),%rsi
> ;*getfield next
> │ │ ; -
> java.util.HashMap::getNode@137 (line 580)
> │ │ ; -
> java.util.LinkedHashMap::get@6 (line 440)
> │ │ ; -
> org.apache.lucene.util.AttributeSource::getAttribute@6 (line 245)
> │ │ ; -
> org.apache.lucene.index.DefaultIndexingChain$PerField::invert@143 (line 734)
> 0.09% 0.51% │ │ 0x00007f52d56b027b: mov 0x18(%rsi),%r8
> 23.70% 3.54% │ │ 0x00007f52d56b027f: mov $0x7f4926a81d88,%rcx ;
>
> {metadata('org/apache/lucene/analysis/tokenattributes/CharTermAttribute')}
> 0.00% 0.00% │ │ 0x00007f52d56b0289: cmp %rcx,%r8
> │ │ 0x00007f52d56b028c: jne 0x00007f52d56b0949
> ;*instanceof
> │ │ ; -
> org.apache.lucene.document.Field::binaryValue@4 (line 441)
> │ │ ; -
> org.apache.lucene.document.Field::tokenStream@65 (line 487)
> │ │ ; -
> org.apache.lucene.index.DefaultIndexingChain$PerField::invert@99 (line 729)
> 0.01% 0.00% ↘ │ 0x00007f52d56b0292: test %r10,%r10
> │ 0x00007f52d56b0295: je 0x00007f52d56b1e0d
> ;*invokevirtual addAttributeImpl
> │ ; -
> org.apache.lucene.util.AttributeSource::addAttribute@80 (line 213)
> │ ; -
> org.apache.lucene.document.Field$StringTokenStream::<init>@8 (line 550)
> │ ; -
> org.apache.lucene.document.Field::tokenStream@47 (line 483)
> │ ; -
> org.apache.lucene.index.DefaultIndexingChain$PerField::invert@99 (line 729)
> 0.01% 0.00% │ 0x00007f52d56b029b: mov (%r11),%rsi
> ;*invokevirtual hashCode
> │ ; -
> java.util.HashMap::hash@9 (line 338)
> │ ; -
> java.util.LinkedHashMap::get@2 (line 440)
> │ ; -
> org.apache.lucene.util.AttributeSource::getAttribute@6 (line 245)
> │ ; -
> org.apache.lucene.index.DefaultIndexingChain$PerField::invert@143 (line 734)
> {code}
> You can see that a significant amount of time is spent in {{mov
> $0x7f4926a81d88,%rcx}}. It turns out that this maps to the following line in
> Java ({{DefaultIndexingChain$PerField::invert}}):
> {code:java}
> CharTermAttribute termAtt = tokenStream.getAttribute(CharTermAttribute.class);
> {code}
> Which is - luckily - unused (and removed by the attached patch).
> We have verified the impact of the change with an Elasticsearch
> macrobenchmark which indexes 165,346,692 documents (~ 74GB) with 24 clients
> concurrently. The reported numbers are median indexing throughput during that
> benchmark against a single Elasticsearch node:
> * Without the patch: 125418 documents/s
> * With the patch: 221237 documents/s
> Details about the benchmark setup etc. are mentioned in the [Elasticsearch
> issue #26339|https://github.com/elastic/elasticsearch/issues/26339] if you're
> interested.
> Unfortunately, it is beyond me why writing this register to main memory takes
> such a long time and why C2 did not eliminate this line as dead code to begin
> with.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]