[
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057047#comment-17057047
]
Aaron Sun edited comment on SOLR-14314 at 3/11/20, 2:52 PM:
------------------------------------------------------------
[~ichattopadhyaya] [~erickerickson] Thanks for the quick response and
appreciate for the help!
The slow response state is not always happens, it seems under some heavy load
it start to happen more often, such as client send burst of queries and add
commands at the same time.
After enable more logging, I notice some times the logging just pause some
seconds, it looks like something is blocking(or too busy to process? CPU/IO?)
{noformat}
<AsyncLogger name="org.apache.solr.update" level="trace"/>
<AsyncLogger name="org.apache.solr.servlet" level="debug"/>
{noformat}
Here are some examples:
{noformat}
2020-03-11 15:00:18.101 DEBUG (qtp820914198-32) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:00:29.111 TRACE (qtp820914198-121) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101702210439403924 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040378
refcount=1} LogPtr(62560436) map=27492440
2020-03-11 15:02:01.019 DEBUG (qtp820914198-101) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:02:06.583 TRACE (qtp820914198-28) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101802040421290224 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040552
refcount=1} LogPtr(5925467) map=2138772739
2020-03-11 15:02:06.584 DEBUG (qtp820914198-28) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101802040421395124}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:02:42.897 TRACE (qtp820914198-128) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101702240520633224 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040606
refcount=1} LogPtr(82150345) map=522043138
2020-03-11 15:02:42.897 DEBUG (qtp820914198-128) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:03:10.117 TRACE (qtp820914198-124) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101702170101660124 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040606
refcount=1} LogPtr(82154733) map=522043138
{noformat}
Client side start to timeout around this time, then time gap in the log
happens more frequently. while the client was sending "add" documents command
continuously, I guess solr shouldn't pause the processing, isn't it?
{noformat}
2020-03-11 15:07:56.699 DEBUG (qtp820914198-1287) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
{{params(commit=true),defaults(wt=
json)}}
2020-03-11 15:08:09.886 TRACE (qtp820914198-1409) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101703060206960524 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040713
refcount=1} LogPtr(50692808) map=35667164
{noformat}
Look at the time stamp gap when follow id qtp820914198-1573
{noformat}
2020-03-11 15:19:15.386 DEBUG (qtp820914198-1573) [ x:agglogtrackitem]
o.a.s.u.DirectUpdateHandler2
updateDocument(add\{_version_=1660877647781036033,id=2101907210548759024})
2020-03-11 15:19:35.983 TRACE (qtp820914198-1573) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101907210548759024 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040731
refcount=1} LogPtr(31050261) map=96441216
2020-03-11 15:19:35.984 DEBUG (qtp820914198-1573) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101907250565323024}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:19:36.228 DEBUG (qtp820914198-1573) [ x:agglogtrackitem]
o.a.s.u.DirectUpdateHandler2
updateDocument(add\{_version_=1660877669383798784,id=2101907250565323024})
2020-03-11 15:19:41.343 TRACE (qtp820914198-1573) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101907250565323024 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040732
refcount=1} LogPtr(10205943) map=103629868
2020-03-11 15:19:41.343 DEBUG (qtp820914198-1573) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101907210548498324}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:19:41.346 DEBUG (qtp820914198-1573) [ x:agglogtrackitem]
o.a.s.u.DirectUpdateHandler2
updateDocument(add\{_version_=1660877675003117572,id=2101907210548498324})
{noformat}
Running solr with this command
{noformat}
java -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime -Xloggc:/var/ardendo/log/solr_gc.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop -Dsolr.solr.home=/ardome/solr
-Djetty.port=8983 -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
{noformat}
Here is the solr log and GC log, now sure how to check the GC pause you
mentioned above.
[^solrlog.tar.gz]
Any guidance to enable more log or some way forward would be very useful.
Thanks for the help!
was (Author: aaronsuns):
[~ichattopadhyaya] [~erickerickson] Thanks for the quick response and
appreciate for the help!
The slow response state is not always happens, it seems under some heavy load
it start to happen more often, such as client send burst of queries and add
commands at the same time.
After enable more logging, I notice some times the logging just pause some
seconds, it looks like something is blocking(or too busy to process? CPU/IO?)
{noformat}
<AsyncLogger name="org.apache.solr.update" level="trace"/>
<AsyncLogger name="org.apache.solr.servlet" level="debug"/>
{noformat}
Here are some examples:
{noformat}
2020-03-11 15:00:18.101 DEBUG (qtp820914198-32) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:00:29.111 TRACE (qtp820914198-121) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101702210439403924 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040378
refcount=1} LogPtr(62560436) map=27492440
2020-03-11 15:02:01.019 DEBUG (qtp820914198-101) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:02:06.583 TRACE (qtp820914198-28) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101802040421290224 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040552
refcount=1} LogPtr(5925467) map=2138772739
2020-03-11 15:02:06.584 DEBUG (qtp820914198-28) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101802040421395124}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:02:42.897 TRACE (qtp820914198-128) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101702240520633224 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040606
refcount=1} LogPtr(82150345) map=522043138
2020-03-11 15:02:42.897 DEBUG (qtp820914198-128) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:03:10.117 TRACE (qtp820914198-124) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101702170101660124 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040606
refcount=1} LogPtr(82154733) map=522043138
{noformat}
Client side start to timeout around this time, then time gap in the log
happens more frequently. while the client was sending "add" documents command
continuously, I guess solr shouldn't pause the processing, isn't it?
{noformat}
2020-03-11 15:07:56.699 DEBUG (qtp820914198-1287) [ x:agglogtrackitem]
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
{{params(commit=true),defaults(wt=
json)}}
2020-03-11 15:08:09.886 TRACE (qtp820914198-1409) [ x:agglogtrackitem]
o.a.s.u.UpdateLog TLOG: added id 2101703060206960524 to
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040713
refcount=1} LogPtr(50692808) map=35667164
{noformat}
Running solr with this command
{noformat}
java -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime -Xloggc:/var/ardendo/log/solr_gc.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop -Dsolr.solr.home=/ardome/solr
-Djetty.port=8983 -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
{noformat}
Here is the solr log and GC log, now sure how to check the GC pause you
mentioned above.
[^solrlog.tar.gz]
Any guidance to enable more log or some way forward would be very useful.
Thanks for the help!
> Solr does not response most of the update request some times
> ------------------------------------------------------------
>
> Key: SOLR-14314
> URL: https://issues.apache.org/jira/browse/SOLR-14314
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Aaron Sun
> Priority: Critical
> Attachments: jstack_bad_state.log, solrlog.tar.gz
>
>
> Solr version:
> {noformat}
> solr-spec
> 8.4.1
> solr-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28
> lucene-spec
> 8.4.1
> lucene-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:35:00
> {noformat}
>
> Java process:
> {noformat}
> java -Xms100G -Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop
> -Dsolr.solr.home=/ardome/solr -Djetty.port=8983
> -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
> {noformat}
> Run on a powerful server with 32 cores, 265GB RAM.
> The problem is that time to time it start to get very slow to update solr
> documents, for example time out after 30 minutes.
> document size is around 20k~50K each, each http request send to /update is
> around 4MB~10MB.
> /update request is done by multi processes.
> Some of the update get response, but the differences between "QTime" and
> http response time is big, one example, qtime = 66s, http response time is
> 2304s.
> According to jstack for the thread state, lots of BLOCKED state.
> thread dump log is attached.
> Any hint would be appreciate, thanks!
>
>
>
>
>
>
>
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]