On 05/25/2010 11:30 PM, Pete Zaitcev wrote:
If a chunkserver goes down, tabled sometimes throws a phantom "object
not found". It happens because we keep hitting the same down node and
exhaust the retries. The existing code calls rand() every time and
hopes for the best, but this is too likely to end poorly.

The fix is to only randomize once before the retry loop, and then
cycle through all available nodes deterministically. The same fix
would apply even if we used a better technique to select an available
chunkserver than just random.

Also, we refactor the code just a little bit, so that the enormous
function object_get_body gets somewhat easier to follow.

Signed-off-by: Pete Zaitcev<[email protected]>

applied


--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to