HoustonPutman opened a new pull request, #3268:
URL: https://github.com/apache/solr/pull/3268

   This is a small fix and I really don't think it warrants a JIRA and 
especially a changelog entry. I can make a JIRA if people thinks it deserves 
one.
   
   The `DistributedApiAsyncTracker` mentioned that there could be a race 
condition between completing an asynchronous request and checking its status. 
This is causing very infrequent test failures, such as: 
`ReindexCollectionTest.testAbort`.
   
   The solution is to just check the ZK paths in reverse order from how they 
are updated.
   
   So when completing or canceling tasks, they are updated in the following 
order:
   
   1. `trackedAsyncTasks.put(asyncId, ...)`  or 
`trackedAsyncTasks.remove(asyncId)`
   1. `inFlightAsyncTasks.deleteInFlightTask(asyncId)`
   
   Therefore in `getAsyncTaskRequestStatus(asyncId)`, we need to check 
`inFlightAsyncTasks` before `trackedAsyncTasks`. This means we can get a 
false-positive "Submitted" or "Running" result (race condition described 
below). But that will just lead to the client checking again at a later time, 
and the next time they call, `inFlightAsyncTasks` will have been updated and we 
will get the actual response from `trackedAsyncTasks`.
   
   Before this PR, the race condition would give us a false-negative "Operation 
failed. Please resubmit" result. (race condition described below). This would 
tell the client to try again, when in fact the task could have been successful. 
This false-negative is much worse than the false-positive described above.
   
   Race condition before this PR: (false-negative)
   1. `getAsyncTaskRequestStatus()` -- `trackedAsyncTasks` is checked -- no 
response is found
   1. `setTaskCompleted()` -- `trackedAsyncTasks` id is updated -- response is 
put into ZK
   1. `setTaskCompleted()` -- `inFlightAsyncTasks` id is deleted -- asyncID is 
deleted from ZK
   1. `getAsyncTaskRequestStatus()` -- `inFlightAsyncTasks ` is checked -- 
asyncId is not found 
      * Return a failure - Assume node died because `inFlightAsyncTasks ` 
ephemeral node is gone
   
   Race condition after this PR: (false-positive)
   1. `setTaskCompleted()` -- `trackedAsyncTasks` id is updated -- response is 
put into ZK
   1. `getAsyncTaskRequestStatus()` -- `inFlightAsyncTasks ` is checked -- 
asyncId is found
      * Return that the task is in progress 
   1. `setTaskCompleted()` -- `inFlightAsyncTasks` id is deleted -- asyncID is 
deleted from ZK


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to