[ 
https://issues.apache.org/jira/browse/SOLR-16129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515502#comment-17515502
 ] 

Chris M. Hostetter commented on SOLR-16129:
-------------------------------------------

Isahn: I'm not surprised your benchmark tests still fail –  as i said, this 
patch doesn't fix whatever root cause is behind SOLR-16099, because we still 
don't know what's causing the underlying problem of the HTTP2 communication 
between the jetty client code and jetty server code stalling for some requests 
(we don't even know if the problem is in the client side or the server side)

 

What this patch _should_ fix is that if – at the low level HTTP2 layer – the 
connection stalls out, then the "solr client" threads (either in a remote 
application using {{Http2SolrClient}} or in {{httpShardExecutor}} threads spun 
up by {{{}HttpShardHandlerFactory{}}}) should not get locked forever with stack 
traces that look like this...
{code:java}
   "stackTrace":["[email protected]/java.lang.Object.wait(Native Method)",
     "[email protected]/java.lang.Object.wait(Unknown Source)",
     
"org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:318)",
     
"org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:90)",
     
"org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:99)",
     
"org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:217)",
     "org.apache.solr.common.util.JavaBinCodec._init(JavaBinCodec.java:211)",
     "org.apache.solr.common.util.JavaBinCodec.initRead(JavaBinCodec.java:202)",
     
"org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:195)",
     
"org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:51)",
     
"org.apache.solr.client.solrj.impl.Http2SolrClient.processErrorsAndResponse(Http2SolrClient.java:696)",
     
"org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:412)",
     
"org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:761)",
 {code}
 

If your benchmark client (using the legacy – apache.commons/http1 based – 
{{HttpSolrClient}} ) is getting timeouts then that's "fine-ish" – the real 
question is whether you still see {{httpShardExecutor}}  threads stuck forever 
when the test is done?

*WITH THE PATCH AS WRITTEN, THOSE THREADS MIGHT STILL HANG AROUND FOR UP TO ONE 
HOUR* (or ~2X the {{timeALlowed}} param if you are using it) but for faster 
testing purposes it would be trivially to change that default in 
{{InputStreamResponseListener}} (look for {{{}requestTimeoutRef{}}})
----
{quote}Steps to reproduce:
{quote}
I know but as i mentioned in SOLR-16099 ...
{quote}(they crashed my machine while attempting to run them)
{quote}

> Solr specific InputStreamResponseListener to prevent client threads from 
> hanging forever
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-16129
>                 URL: https://issues.apache.org/jira/browse/SOLR-16129
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-16129.patch
>
>
> This issue tracks the implementation of workaround I suggested for SOLR-16099 
> - it does not _fix_ the underlying bug (which as of this writting doesn't 
> have an identified root cause) but it does ensure that client threads which 
> encounter the bug won't hang forever...
> {quote}One thing we may want to consider (in Solr) is replacing our usage of 
> {{InputStreamResponseListener}} with a variant implementation that uses a 
> "timeout" instead of an unlimited {{wait()}} (along the lines of a [spin-off 
> jetty enhancement issue|https://github.com/eclipse/jetty.project/issues/7259] 
> one of the jetty devs filed). We could probably (with some effort) tweak the 
> impacted Solr APIs to propogate the (remaining) {{timeAllowed}} (if that 
> option was specified) down to this class – and/or have an "extreme" default 
> (ie: 30min) just to prevent threads from sticking around forever.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to