key->volume hash table is not consistent when a disk is marked as bad or
removed due to failure
-----------------------------------------------------------------------------------------------
Key: TS-949
URL: https://issues.apache.org/jira/browse/TS-949
Project: Traffic Server
Issue Type: Bug
Components: Cache
Affects Versions: 3.1.0
Environment: Multi-volume cache with apparently faulty drives
Reporter: B Wyatt
The method for resolving collisions when distributing hash-table space to
volumes for the object_key->volume hash table creates inconsistency when a disk
is determined to be bad, or when a failed disk is removed from the
volume.config.
Background:
The hash space is distributed by round robin draft where each volume "drafts" a
random index in the hash table until the hash space is exhausted. The random
order in which a given volume drafts hash table slots is consistent across
reboot/crash/disk-failure, however when a volume attempts to draft a slot which
has already been occupied, it skips to its next random pick and attempts to
draft that slot until it finds an open slot. This ensures that the hash is
partitioned evenly between volumes.
The issue:
Resolving slot contention breaks the consistency as it is dependent on the
order that the volumes draft. When rebuilding the hash after disk failure or
reboot with fewer drives, a volume may secure an index that was previously
occupied by the dead-disk. In the old hash, the surviving volume would have
selected another random index due to contention. If this index is taken, by
the next draft round it will represent an inconsistent key->volume result. The
effects of one inconsistency will then cascade as whichever volume occupies
that index after removing a dead disk is now behind on its draft sequence as
well.
An Example:
||Disk||Draft Sequence||
|A|1,4,7,5|
|B|4,2,8,1|
|C|3,7,5,2|
Pre-failure Hash Table after 2 rounds of draft:
|A|B|C|B|C|?|A|?|
Post-failure of drive B Hash Table after 3 rounds of draft:
|A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
Two slots have become inconsistent and more will probably follow. These
inconsistencies become objects stored in a volume but lost to the top level
cache for open/lookup.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira