[
https://issues.apache.org/jira/browse/SOLR-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240321#comment-15240321
]
Steve Rowe commented on SOLR-8662:
----------------------------------
Lots of good changes in your patch, Varun! I ran all Solr+Solrj+contrib tests
and everything passes.
I found a few issues though.
This change is good, but there remains a pre-existing problem:
{code:java|title:SchemaManager.java}
- int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS,
-1);
- long startTime = System.nanoTime();
- long endTime = timeout > 0 ? System.nanoTime() + (timeout * 1000 * 1000) :
Long.MAX_VALUE;
+ //The default timeout is 60s when no BaseSolrResource.UPDATE_TIMEOUT_SECS
is specified
+ int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS,
60);
+
+ //If BaseSolrResource.UPDATE_TIMEOUT_SECS=0 or -1 then end time is
unlimited.
+ long endTime = timeout > 0 ? TimeUnit.NANOSECONDS.convert(timeout,
TimeUnit.SECONDS) + System.nanoTime() : Long.MAX_VALUE;
{code}
The problem is that {{System.nanoTime()}} can return *anything* in the range
\[{{Long.MIN_VALUE}}, {{Long.MAX_VALUE}}\] and it intentionally overflows, so
making {{endTime}} be {{Long.MAX_VALUE}} is effectively choosing a random time
in the future; this is *not* the same as "unlimited". This same situation is
dealt with in the {{QueryTimeout}} implementations - used by
{{ExitableDirectoryReader}} - by adding a large value to the current value of
{{nanoTime()}} to arrive at an effectively unlimited end time.
----
Your changes in {{ZkSolrResourceLoader.openResource()}} introduce the
possibility of retrying 10 times to fetch a resource from ZK when it's not
there (according to {{zkController.pathExists(file)}) - I think in this case
the retry loop should be immediately exited, so that the classpath search can
start immediately. (I'm assuming that the intent here is *not* to wait around
for another thread/process to finish uploading a resource - if that's the case
then the branch for when the path doesn't exist should have a timeout, which it
doesn't in your patch.)
----
bq. Without the changes to SchemaManager the test would fail.
Indeed, when I revert the SchemaManager changes,
{{TestManagedSchemaAPI.testAddFieldAndDocument()}} fails. But
{{testReloadAndAddSimple()}} always succeeds, regardless of the SchemaManager
changes, so I'm not sure why it's there, since it doesn't have anything to do
with new field visibility - can it be removed?
> SchemaManager doesn't wait correctly for replicas to update
> -----------------------------------------------------------
>
> Key: SOLR-8662
> URL: https://issues.apache.org/jira/browse/SOLR-8662
> Project: Solr
> Issue Type: Bug
> Reporter: Varun Thacker
> Assignee: Varun Thacker
> Attachments: SOLR-8662.patch
>
>
> Currently in SchemaManager, waitForOtherReplicasToUpdate doesn't execute
> since it gets passed the timeout value instead of the end time
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]