[
https://issues.apache.org/jira/browse/SOLR-16676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698160#comment-17698160
]
Alex Deparvu edited comment on SOLR-16676 at 3/9/23 3:20 AM:
-------------------------------------------------------------
it seems the trouble is with the request callbacks being called on different
threads, under some circumstances (I am assuming high load/ memory pressure).
some assumptions I added do not hold anymore and the following pattern is not
correct:
{noformat}
MDCCopyHelper mdcCopyHelper = new MDCCopyHelper();
req.onRequestBegin(mdcCopyHelper);
// omitted for brevity
req.onComplete(mdcCopyHelper);
{noformat}
the failure happens as the following:
* new MDCCopyHelper() happens on the caller thread, picks up the correct MDC
context (as expected)
* onRequestBegin callback happens on thread 3:
'[httpShardExecutor-18-thread-3)'
* 'response processing started' happens on a different thread
'httpShardExecutor-18-thread-2' which has no MDC context available to it to
push forward
it seems more often than not the last 2 events are happening on the same thread
which allows the MDC context to be copied over correctly.
I am still evaluating options for the fix.
was (Author: alex.parvulescu):
it seems the trouble is with the request callbacks being called on different
threads, under some circumstances (I am assuming high load/ memory pressure).
some assumptions I added do not hold anymore and the following pattern is not
correct:
```
MDCCopyHelper mdcCopyHelper = new MDCCopyHelper();
req.onRequestBegin(mdcCopyHelper);
// omitted for brevity
req.onComplete(mdcCopyHelper);
```
the failure happens as the following:
* new MDCCopyHelper() happens on the caller thread, picks up the correct MDC
context (as expected)
* onRequestBegin callback happens on thread 3: '[httpShardExecutor-18-thread-3)'
* 'response processing started' happens on a different thread
'httpShardExecutor-18-thread-2' which has no MDC context available to it to
push forward
it seems more often than not the last 2 events are happening on the same thread
which allows the MDC context to be copied over correctly.
I am still evaluating options for the fix.
> Http2SolrClient loss of MDC context
> -----------------------------------
>
> Key: SOLR-16676
> URL: https://issues.apache.org/jira/browse/SOLR-16676
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrJ
> Affects Versions: 9.0, 9.1
> Reporter: Alex Deparvu
> Priority: Minor
> Fix For: 9.2
>
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> The Http2SolrClient loses MDC context information when running an async
> request in Solr 9.x.
> The issue is the 'Request#send' [0] call is actually async itself and by the
> time the response listener kicks in to push the response processing to the
> executor the MDC context is already lost, so the executor will no longer have
> access to the original MDC in order to push it forward onto the thread that
> will process the response.
>
> This is very difficult to capture on a running system, there are no logs
> during this window. I only saw it because I was specifically looking at
> thread names for a different reason.
> This is how it is reflected in the thread names:
> - how it should be (Solr 8 style. containing all MDC data):
> {quote}{{httpShardExecutor-5-thread-19-processing-gettingstarted_shard2_replica_n2
> core_node5 localhost:8983_solr gettingstarted shard2 localhost-4}}
> {quote}
> - how it is in Solr 9 (due to no MDC context)
> {quote}httpShardExecutor-5-thread-10
> {quote}
> I can't tell if there is anything breaking due to this.
> [0]
> [https://github.com/apache/solr/blob/7eee7a8ad3c43db0dc26c663dd16764d1fb3dbf4/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L458]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]