[
https://issues.apache.org/jira/browse/SOLR-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893500#comment-17893500
]
Jason Gerlowski commented on SOLR-17515:
----------------------------------------
Alright, a few updates:
*First, a workaround!* The NPE reported above triggers when using
'PreemptiveBasicAuthClientBuilderFactory' to initialize a nascent
Http2SolrClient. PreemptiveBasicAuthClientBuilderFactory is frequently useful
on the client-side and in "standalone" Solr setups. But it's not strictly
needed on nodes running SolrCloud, which will either use PKI or forward
user-provided credentials on any internode requests.
So SolrCloud users at least can prevent this NPE by ensuring the
'solr.httpclient.builder.factory' system property is not set on their Solr
nodes. (This system property is set by the "bin/solr" startup script itself if
"SOLR_AUTH_TYPE" is specified in "solr.in.sh", so removing the SOLR_AUTH_TYPE
env-var will also help prevent Solr nodes from using
PremeptiveBasicAuthClientBuilderFactory.)
While this workaround seems effective in preliminary testing - the underlying
bug (i.e. that Http2SolrClient doesn't always initialize its
"authenticationStore") still remains on 9.7, so this workaround may not solve
all problems.
*Second, a note on testing*
Solr has a number of tests that cover basic-auth and recovery together, so this
bug has been a bit confounding: why didn't our existing tests catch this issue?
The workaround above points towards the answer: the bug relies on a sysprop
that's frequently set by the "bin/solr" scripts, but not set in our Java-based
tests. So this bug is another victim of the "fidelity gap" between how we test
Solr and how it is deployed by users.
*Third, a prospective fix*
I've created a PR [here|https://github.com/apache/solr/pull/2802], which aims
to fix the initialization bug in Http2SolrClient. It also strengthens some
existing tests so that they reproduce the NPE when this fix is not in place.
> Recovery fails in Solr 9.7.0 if basic-auth is enabled
> -----------------------------------------------------
>
> Key: SOLR-17515
> URL: https://issues.apache.org/jira/browse/SOLR-17515
> Project: Solr
> Issue Type: Bug
> Affects Versions: 9.7
> Reporter: Jason Gerlowski
> Assignee: Jason Gerlowski
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Several reporters on the users@ list, recently shared a bug they noticed on
> upgrading to Solr 9.7. Replicas would try to recover, but fail with a
> NullPointerException:
> {code}
> 2024-09-18 09:36:31.238 ERROR
> (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr
> dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts
> s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:]
> o.a.s.c.RecoveryStrategy Error while trying to recover.
> core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot
> invoke
> "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)"
> because "this.authenticationStore" is null
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318)
> java.lang.NullPointerException: Cannot invoke
> "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)"
> because "this.authenticationStore" is null
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318)
> ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 -
> anshum - 2024-09-03 15:05:20]
> at
> org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97)
> ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 -
> anshum - 2024-09-03 15:05:20]
> at
> org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85)
> ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 -
> anshum - 2024-09-03 15:05:20]
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093)
> ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 -
> anshum - 2024-09-03 15:05:20]
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062)
> ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 -
> anshum - 2024-09-03 15:05:20]
> at
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907)
> ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 -
> anshum - 2024-09-03 15:05:20]
> at
> org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633)
> ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 -
> anshum - 2024-09-03 15:05:20]
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333)
> ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
> - 2024-09-03 15:05:20]
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309)
> ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum
> - 2024-09-03 15:05:20]
> ...
> {code}
> It turns out that the issue isn't specific to upgrading clusters: *any 9.7.0
> cluster (new or existing/upgrading) that uses basic-auth will hit this NPE on
> during replica recovery*. The result is that replicas will fail to recover,
> and sit marked as "recovering" indefinitely.
> The issue can be reproduced locally in a source-checkout using the following
> steps:
> {code}
> git checkout branch_9_7
> ./gradlew clean assemble
> cd solr/packaging/build/solr-9.7.0-SNAPSHOT
> # At prompts, I chose: 4 nodes, "gettingstarted", 1 shard, 2 replicas,
> "_default" configset
> bin/solr start -e cloud
> bin/solr post -c gettingstarted example/exampledocs/books.json
> # Stop the node containing the non-leader replica
> bin/solr stop -p <port>
> bin/solr post -c gettingstarted example/exampledocs/books.csv
> # Enable auth and trigger recovery by turning the node back on
> bin/solr auth enable -type basicAuth -credentials solr:solrRocks
> -blockUnknown true
> # This line will need tweaked based on which Solr node was previously stopped
> "bin/solr" start --cloud -p <port> -s "example/cloud/<node>/solr" -z
> 127.0.0.1:9983
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]