HoustonPutman opened a new pull request, #3268: URL: https://github.com/apache/solr/pull/3268
This is a small fix and I really don't think it warrants a JIRA and especially a changelog entry. I can make a JIRA if people thinks it deserves one. The `DistributedApiAsyncTracker` mentioned that there could be a race condition between completing an asynchronous request and checking its status. This is causing very infrequent test failures, such as: `ReindexCollectionTest.testAbort`. The solution is to just check the ZK paths in reverse order from how they are updated. So when completing or canceling tasks, they are updated in the following order: 1. `trackedAsyncTasks.put(asyncId, ...)` or `trackedAsyncTasks.remove(asyncId)` 1. `inFlightAsyncTasks.deleteInFlightTask(asyncId)` Therefore in `getAsyncTaskRequestStatus(asyncId)`, we need to check `inFlightAsyncTasks` before `trackedAsyncTasks`. This means we can get a false-positive "Submitted" or "Running" result (race condition described below). But that will just lead to the client checking again at a later time, and the next time they call, `inFlightAsyncTasks` will have been updated and we will get the actual response from `trackedAsyncTasks`. Before this PR, the race condition would give us a false-negative "Operation failed. Please resubmit" result. (race condition described below). This would tell the client to try again, when in fact the task could have been successful. This false-negative is much worse than the false-positive described above. Race condition before this PR: (false-negative) 1. `getAsyncTaskRequestStatus()` -- `trackedAsyncTasks` is checked -- no response is found 1. `setTaskCompleted()` -- `trackedAsyncTasks` id is updated -- response is put into ZK 1. `setTaskCompleted()` -- `inFlightAsyncTasks` id is deleted -- asyncID is deleted from ZK 1. `getAsyncTaskRequestStatus()` -- `inFlightAsyncTasks ` is checked -- asyncId is not found * Return a failure - Assume node died because `inFlightAsyncTasks ` ephemeral node is gone Race condition after this PR: (false-positive) 1. `setTaskCompleted()` -- `trackedAsyncTasks` id is updated -- response is put into ZK 1. `getAsyncTaskRequestStatus()` -- `inFlightAsyncTasks ` is checked -- asyncId is found * Return that the task is in progress 1. `setTaskCompleted()` -- `inFlightAsyncTasks` id is deleted -- asyncID is deleted from ZK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org