[
https://issues.apache.org/jira/browse/SOLR-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716031#comment-17716031
]
Alex Deparvu commented on SOLR-7609:
------------------------------------
Updating with the GitHub data as PR is close to being merged, for future
reference
Changes done:
* added the version check on additions to fail in case we are not leader and
version = 0. (to match delete flows)
* changed error status from BAD_REQUEST to INVALID_STATE to allow for retries.
I was able to verify retries are happening [0]
* removed a 'cmd' variable - this is just minor readability refactoring, I
tried to avoid changing the code as much as possible
* updated the ShardSplitTest to keep track of exceptions happening during the
concurrent adds and deletes and fail if needed.
* fixed wrong NPE check on
[DistributedZkUpdateProcessor#getCollectionUrls|https://github.com/apache/solr/blob/db4cb66271f615da6a0a3ae6fed5fb2e184fd053/solr/core/src/java/org/apache/solr/update/processor/DistributedZkUpdateProcessor.java#L889]
Things to followup later:
* there is still one failure happening `Request says it is coming from parent
shard leader but we are in active state`
* noticed the setupRequest() method is usually called twice, I think this is
easy to fix with a basic flag, I can add it if it doesn't grow the PR too much,
or it can be done on a followup PR.
* all over the class there is a pattern of checking read only status to
prevent some operations I believe could be broken.
{code:java}
clusterState = zkController.getClusterState();
if (isReadOnly()) {
throw new SolrException(ErrorCode.FORBIDDEN, "Collection " + collection + "
is read-only.");
}
{code}
refreshing the clusterState is insufficient, because the isReadOnly is based on
the readOnlyCollection flag that is only initialized at the beginning. if the
intent was to have a fresh check, the readOnlyCollection flag needs to be
updated too, based on the new clusterState
> ShardSplitTest NPE
> ------------------
>
> Key: SOLR-7609
> URL: https://issues.apache.org/jira/browse/SOLR-7609
> Project: Solr
> Issue Type: Bug
> Reporter: Steven Rowe
> Priority: Minor
> Attachments: ShardSplitTest.NPE.log
>
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> I'm guessing this is a test bug, but the seed doesn't reproduce for me (tried
> on the same Linux machine it occurred on and on OS X):
> {noformat}
> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ShardSplitTest
> -Dtests.method=test -Dtests.seed=9318DDA46578ECF9 -Dtests.slow=true
> -Dtests.locale=is -Dtests.timezone=America/St_Vincent -Dtests.asserts=true
> -Dtests.file.encoding=US-ASCII
> [junit4] ERROR 55.8s J6 | ShardSplitTest.test <<<
> [junit4] > Throwable #1: java.lang.NullPointerException
> [junit4] > at
> __randomizedtesting.SeedInfo.seed([9318DDA46578ECF9:1B4CE27ECB848101]:0)
> [junit4] > at
> org.apache.solr.cloud.ShardSplitTest.logDebugHelp(ShardSplitTest.java:547)
> [junit4] > at
> org.apache.solr.cloud.ShardSplitTest.checkDocCountsAndShardStates(ShardSplitTest.java:438)
> [junit4] > at
> org.apache.solr.cloud.ShardSplitTest.splitByUniqueKeyTest(ShardSplitTest.java:222)
> [junit4] > at
> org.apache.solr.cloud.ShardSplitTest.test(ShardSplitTest.java:84)
> [junit4] > at
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
> [junit4] > at
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
> [junit4] > at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Line 547 of {{ShardSplitTest.java}} is:
> {code:java}
> idVsVersion.put(document.getFieldValue("id").toString(),
> document.getFieldValue("_version_").toString());
> {code}
> Skimming the code, it's not obvious what could be null.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]