[
https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865912#comment-15865912
]
Andrzej Bialecki edited comment on SOLR-10130 at 2/14/17 3:09 PM:
-------------------------------------------------------------------
I haven't been able to reproduce such drastic slowdown using simple benchmarks
- example results from indexing using {{post}} tool, fairly representative from
several runs on each branch:
{code}
* branch_6_3
real 4m14.804s
user 0m0.883s
sys 0m2.279s
* branch_6_4
real 5m0.987s
user 0m0.910s
sys 0m2.276s
* jira/solr-10130
real 4m38.097s
user 0m0.881s
sys 0m2.287s
{code}
Profiler indeed shows that one of the hotspots on branch_6_4 is the
{{Meter.mark}} code that is called in
{{org.apache.solr.core.MetricsDirectoryFactory$MetricsInput.readByte}}. In my
test the profiler showed that this consumes ~ 3% CPU, which is indeed something
that we should avoid and turn off by default.
However, this still doesn't explain the order of magnitude slowdown reported
above.
[~emaijala] and [~wunder] - please apply the above patch in your environment
and see what is the impact. It makes sense to make this change anyway, so I'm
going to apply this or similar version to all affected branches, but maybe
there's more we can do here.
was (Author: ab):
I haven't been able to reproduce such drastic slowdown using simple benchmarks
- example results from indexing using {{post}} tool, fairly representative from
several runs on each branch:
{code}
* branch_6_3
* branch_6_4
* jira/solr-10130
{code}
Profiler indeed shows that one of the hotspots on branch_6_4 is the
{{Meter.mark}} code that is called in
{{org.apache.solr.core.MetricsDirectoryFactory$MetricsInput.readByte}}. In my
test the profiler showed that this consumes ~ 3% CPU, which is indeed something
that we should avoid and turn off by default.
However, this still doesn't explain the order of magnitude slowdown reported
above.
[~emaijala] and [~wunder] - please apply the above patch in your environment
and see what is the impact. It makes sense to make this change anyway, so I'm
going to apply this or similar version to all affected branches, but maybe
there's more we can do here.
> Serious performance degradation in Solr 6.4.1 due to the new metrics
> collection
> -------------------------------------------------------------------------------
>
> Key: SOLR-10130
> URL: https://issues.apache.org/jira/browse/SOLR-10130
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: metrics
> Affects Versions: 6.4.1
> Environment: Centos 7, OpenJDK 1.8.0 update 111
> Reporter: Ere Maijala
> Assignee: Andrzej Bialecki
> Priority: Blocker
> Labels: perfomance
> Attachments: SOLR-10130.patch, solr-8983-console-f1.log
>
>
> We've stumbled on serious performance issues after upgrading to Solr 6.4.1.
> Looks like the new metrics collection system in MetricsDirectoryFactory is
> causing a major slowdown. This happens with an index configuration that, as
> far as I can see, has no metrics specific configuration and uses
> luceneMatchVersion 5.5.0. In practice a moderate load will completely bog
> down the server with Solr threads constantly using up all CPU (600% on 6 core
> machine) capacity with a load that normally where we normally see an average
> load of < 50%.
> I took stack traces (I'll attach them) and noticed that the threads are
> spending time in com.codahale.metrics.Meter.mark. I tested building Solr
> 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte
> and getBytes methods and was unable to reproduce the issue.
> As far as I can see there are several issues:
> 1. Collecting metrics on every single byte read is slow.
> 2. Having it enabled by default is not a good idea.
> 3. The comment "enable coarse-grained metrics by default" at
> https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104
> implies that only coarse-grained metrics should be enabled by default, and
> this contradicts with collecting metrics on every single byte read.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]