Why don't you see how it can return null?

I'm looking at an older checkout, but I see JettySolrRunner checking for
null core containers all over, and I see it passing back null explicitly in
at least one case.

When I peek at where that core container might be coming from, I see a
provider and a field that looks like it's home (which I note doesn't look
protected by any memory barrier? e.g., volatile, lock, sync). And I see
that it could start as null. Get set to null on close as well?

So I wonder about that lack of a memory barrier, but there are probably
plenty of cases where some random jobs/threads are still running past that
close as well, is another thought I have. And I bet one of them comes in
and looks for that core container late, and he's already clocked out.

Older checkout, so I don't know what you are looking at, but if it hasn't
changed drastically recently, it seems easy to return a null.

If you want to duplicate a situation that might hit - try running the test
with 10-20 instances simultaneously looped.

Or loop one, and hammer your system with some unrelated load for a while.

On Thu, Jun 15, 2023 at 4:49 PM Alex Deparvu <stilla...@apache.org> wrote:

> Hi,
>
> I wanted to take a look at the flaky DeleteReplicaTest test.
>
> Some background first:
> - Past 7 days trend:
> Class: org.apache.solr.cloud.DeleteReplicaTest
> Method: raceConditionOnDeleteAndRegisterReplica
> Failures: 15.56% (63 / 405)
>
> - Test failure is caused by a NullPointerException:
> ERROR (coreZkRegister-772-thread-1-processing-127.0.0.1:40471_solr)
> [n:127.0.0.1:40471_solr c:raceDeleteReplicaCollection s:shard1
> r:core_node4
> x:raceDeleteReplicaCollection_shard1_replica_n2] o.a.s.c.DeleteReplicaTest
> Failed to delete replica
>  => java.lang.NullPointerException: Cannot invoke
> "org.apache.solr.core.CoreContainer.getZkController()" because the return
> value of "org.apache.solr.embedded.JettySolrRunner.getCoreContainer()" is
> null
>
> I am having some trouble reproducing on my local and I don't see how the
> getCoreContainer() method might return null. Could this be a timing issue
> somehow?
> If anyone has an idea on how to approach this, I would be happy to hear it.
>
> best,
> alex
>


-- 
- MRM

Reply via email to