[ 
https://issues.apache.org/jira/browse/SOLR-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057047#comment-17057047
 ] 

Aaron Sun edited comment on SOLR-14314 at 3/11/20, 2:52 PM:
------------------------------------------------------------

[~ichattopadhyaya] [~erickerickson]  Thanks for the quick response and 
appreciate for the help!

The slow response state is not always happens, it seems under some heavy load 
it start to happen more often, such as client send burst of queries and add 
commands at the same time.

After enable more logging, I notice some times the logging just pause some 
seconds, it looks like something is blocking(or too busy to process? CPU/IO?)
{noformat}
<AsyncLogger name="org.apache.solr.update" level="trace"/>

<AsyncLogger name="org.apache.solr.servlet" level="debug"/>

{noformat}
 

Here are some examples:
{noformat}
2020-03-11 15:00:18.101 DEBUG (qtp820914198-32) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:00:29.111 TRACE (qtp820914198-121) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101702210439403924 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040378 
refcount=1} LogPtr(62560436) map=27492440

 

2020-03-11 15:02:01.019 DEBUG (qtp820914198-101) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:02:06.583 TRACE (qtp820914198-28) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101802040421290224 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040552 
refcount=1} LogPtr(5925467) map=2138772739
2020-03-11 15:02:06.584 DEBUG (qtp820914198-28) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101802040421395124} 
\{{params(commit=true),defaults(wt=json)}}


2020-03-11 15:02:42.897 TRACE (qtp820914198-128) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101702240520633224 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040606 
refcount=1} LogPtr(82150345) map=522043138
2020-03-11 15:02:42.897 DEBUG (qtp820914198-128) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:03:10.117 TRACE (qtp820914198-124) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101702170101660124 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040606 
refcount=1} LogPtr(82154733) map=522043138

{noformat}
 

Client side start to timeout around this time,  then time gap in the log 
happens more frequently. while the client was sending "add" documents command 
continuously, I guess solr shouldn't pause the processing, isn't it?
{noformat}
2020-03-11 15:07:56.699 DEBUG (qtp820914198-1287) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 {{params(commit=true),defaults(wt=
json)}}
2020-03-11 15:08:09.886 TRACE (qtp820914198-1409) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101703060206960524 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040713 
refcount=1} LogPtr(50692808) map=35667164

{noformat}
 

Look at the time stamp gap when follow id qtp820914198-1573

{noformat}

2020-03-11 15:19:15.386 DEBUG (qtp820914198-1573) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660877647781036033,id=2101907210548759024})
2020-03-11 15:19:35.983 TRACE (qtp820914198-1573) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101907210548759024 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040731 
refcount=1} LogPtr(31050261) map=96441216
2020-03-11 15:19:35.984 DEBUG (qtp820914198-1573) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101907250565323024} 
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:19:36.228 DEBUG (qtp820914198-1573) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660877669383798784,id=2101907250565323024})
2020-03-11 15:19:41.343 TRACE (qtp820914198-1573) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101907250565323024 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040732 
refcount=1} LogPtr(10205943) map=103629868
2020-03-11 15:19:41.343 DEBUG (qtp820914198-1573) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101907210548498324} 
\{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:19:41.346 DEBUG (qtp820914198-1573) [ x:agglogtrackitem] 
o.a.s.u.DirectUpdateHandler2 
updateDocument(add\{_version_=1660877675003117572,id=2101907210548498324})

{noformat}

 

Running solr with this command
{noformat}
java -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime -Xloggc:/var/ardendo/log/solr_gc.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
-Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop -Dsolr.solr.home=/ardome/solr 
-Djetty.port=8983 -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http

{noformat}
 

Here is the solr log and GC log, now sure how to check the GC pause you 
mentioned above.

[^solrlog.tar.gz]

Any guidance to enable more log or some way forward would be very useful.

Thanks for the help!

 

 


was (Author: aaronsuns):
[~ichattopadhyaya] [~erickerickson]  Thanks for the quick response and 
appreciate for the help!

The slow response state is not always happens, it seems under some heavy load 
it start to happen more often, such as client send burst of queries and add 
commands at the same time.

After enable more logging, I notice some times the logging just pause some 
seconds, it looks like something is blocking(or too busy to process? CPU/IO?)
{noformat}
<AsyncLogger name="org.apache.solr.update" level="trace"/>

<AsyncLogger name="org.apache.solr.servlet" level="debug"/>

{noformat}
 

Here are some examples:
{noformat}
2020-03-11 15:00:18.101 DEBUG (qtp820914198-32) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:00:29.111 TRACE (qtp820914198-121) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101702210439403924 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040378 
refcount=1} LogPtr(62560436) map=27492440

 

2020-03-11 15:02:01.019 DEBUG (qtp820914198-101) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:02:06.583 TRACE (qtp820914198-28) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101802040421290224 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040552 
refcount=1} LogPtr(5925467) map=2138772739
2020-03-11 15:02:06.584 DEBUG (qtp820914198-28) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE add\{,id=2101802040421395124} 
\{{params(commit=true),defaults(wt=json)}}


2020-03-11 15:02:42.897 TRACE (qtp820914198-128) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101702240520633224 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040606 
refcount=1} LogPtr(82150345) map=522043138
2020-03-11 15:02:42.897 DEBUG (qtp820914198-128) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 \{{params(commit=true),defaults(wt=json)}}
2020-03-11 15:03:10.117 TRACE (qtp820914198-124) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101702170101660124 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040606 
refcount=1} LogPtr(82154733) map=522043138

{noformat}
 

Client side start to timeout around this time,  then time gap in the log 
happens more frequently. while the client was sending "add" documents command 
continuously, I guess solr shouldn't pause the processing, isn't it?
{noformat}
2020-03-11 15:07:56.699 DEBUG (qtp820914198-1287) [ x:agglogtrackitem] 
o.a.s.u.p.LogUpdateProcessorFactory PRE_UPDATE 
commit\{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 {{params(commit=true),defaults(wt=
json)}}
2020-03-11 15:08:09.886 TRACE (qtp820914198-1409) [ x:agglogtrackitem] 
o.a.s.u.UpdateLog TLOG: added id 2101703060206960524 to 
tlog\{file=/ardome/solr/agglogtrackitem/data/tlog/tlog.0000000000000040713 
refcount=1} LogPtr(50692808) map=35667164

{noformat}
 

Running solr with this command
{noformat}
java -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime -Xloggc:/var/ardendo/log/solr_gc.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
-Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop -Dsolr.solr.home=/ardome/solr 
-Djetty.port=8983 -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http

{noformat}
 

Here is the solr log and GC log, now sure how to check the GC pause you 
mentioned above.

[^solrlog.tar.gz]

Any guidance to enable more log or some way forward would be very useful.

Thanks for the help!

 

 

> Solr does not response most of the update request some times
> ------------------------------------------------------------
>
>                 Key: SOLR-14314
>                 URL: https://issues.apache.org/jira/browse/SOLR-14314
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Aaron Sun
>            Priority: Critical
>         Attachments: jstack_bad_state.log, solrlog.tar.gz
>
>
> Solr version:
> {noformat}
> solr-spec
> 8.4.1
> solr-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28
> lucene-spec
> 8.4.1
> lucene-impl
> 8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:35:00
> {noformat}
>  
> Java process:
> {noformat}
> java -Xms100G -Xmx200G -DSTOP.PORT=8078 -DSTOP.KEY=ardsolrstop 
> -Dsolr.solr.home=/ardome/solr -Djetty.port=8983 
> -Dsolr.log.dir=/var/ardendo/log -jar start.jar --module=http
> {noformat}
> Run on a powerful server with 32 cores, 265GB RAM.
> The problem is that time to time it start to get very slow to update solr 
> documents, for example time out after 30 minutes.
> document size is around 20k~50K each, each http request send to /update is 
> around 4MB~10MB.
> /update request is done by multi processes.
> Some of the update get response, but the differences between "QTime"  and 
> http response time is big, one example, qtime = 66s, http response time is 
> 2304s.
> According to jstack for the thread state, lots of BLOCKED state.
> thread dump log is attached.
> Any hint would be appreciate, thanks!
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to