[
https://issues.apache.org/jira/browse/SOLR-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629050#comment-17629050
]
Michael Gibney commented on SOLR-16414:
---------------------------------------
Thank you for the in-depth analysis, Patson! And thanks for the thread dump and
for identifying this issue, Ishan and Noble. Would you be able to provide logs
for the shutdown as well?
{quote} We actually do not need any parallelism. The operations are quite
fast{quote}
iiuc everyone's in agreement on that point; but the way this manifested doesn't
look like it's simply related to concurrent load induced by using
{{parallelStream}} instead of serial {{forEach}}. On the one hand this
hopefully reassures [~janhoy] that this fix isn't simply a matter of throttling
load in an arbitrary way -- it's actually a consequence of the behavior of
{{parallelStream}} in a way unrelated to parallelism _per se_. On the other
hand, this may have uncovered a latent issue, perhaps around exception
handling/ordering assumptions in the shutdown code, warranting digging a bit
further to figure out more specifically what's going on, and if there may be
other changes that could guard against this kind of thing happening in the
future.
Patson's analysis definitely seems relevant, but the thread dump Ishan posted
seems to point at something else possibly going on. What I find curious about
the thread dump is that it doesn't actually look like resource contention at
this point; rather, it looks like a bunch of non-daemon threads somehow got
created _after_ the shutdown process considered itself to be finished, and the
non-daemon threads are preventing the JVM from exiting, despite the fact that
the shutdown hook has exited and no more work is actually being done.
It's possible I'm misreading the situation, but fwiw that hypothetical
situation could potentially be a consequence of the behavior Patson outlined:
could the tasks executed by parallelStream somehow re-instantiate
"searcherExecutor" and "parallelCoreAdminExecutor" thread pools _after_ the
point when the shutdown process would consider the need to shut them down?
> Race condition in PRS state updates
> -----------------------------------
>
> Key: SOLR-16414
> URL: https://issues.apache.org/jira/browse/SOLR-16414
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Noble Paul
> Assignee: Noble Paul
> Priority: Major
> Fix For: 9.1
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> For PRS collections the individual states are potentially updated from
> individual nodes and sometimes from overseer too. it's possible that
>
> # OP1 is sent to overseer at T1
> # OP2 is executed in the node itself at T2
>
> Because we cannot guarantee that the OP1 sent to overseer may execute before
> OP2 tyhe final state will be the result of OP1 which is incorrect and can
> lead to errors .
> The solution is to never do any PRS writes from overseer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]