key->volume hash table is not consistent when a disk is marked as bad or 
removed due to failure
-----------------------------------------------------------------------------------------------

                 Key: TS-949
                 URL: https://issues.apache.org/jira/browse/TS-949
             Project: Traffic Server
          Issue Type: Bug
          Components: Cache
    Affects Versions: 3.1.0
         Environment: Multi-volume cache with apparently faulty drives
            Reporter: B Wyatt


The method for resolving collisions when distributing hash-table space to 
volumes for the object_key->volume hash table creates inconsistency when a disk 
is determined to be bad, or when a failed disk is removed from the 
volume.config.

Background:
The hash space is distributed by round robin draft where each volume "drafts" a 
random index in the hash table until the hash space is exhausted.  The random 
order in which a given volume drafts hash table slots is consistent across 
reboot/crash/disk-failure, however when a volume attempts to draft a slot which 
has already been occupied, it skips to its next random pick and attempts to 
draft that slot until it finds an open slot.  This ensures that the hash is 
partitioned evenly between volumes.

The issue:
Resolving slot contention breaks the consistency as it is dependent on the 
order that the volumes draft.  When rebuilding the hash after disk failure or 
reboot with fewer drives, a volume may secure an index that was previously 
occupied by the dead-disk.  In the old hash, the surviving volume would have 
selected another random index due to contention.  If this index is taken, by 
the next draft round it will represent an inconsistent key->volume result.  The 
effects of one inconsistency will then cascade as whichever volume occupies 
that index after removing a dead disk is now behind on its draft sequence as 
well. 

An Example:
||Disk||Draft Sequence||
|A|1,4,7,5|
|B|4,2,8,1|
|C|3,7,5,2|
Pre-failure Hash Table after 2 rounds of draft:
|A|B|C|B|C|?|A|?|

Post-failure of drive B Hash Table after 3 rounds of draft:
|A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|

Two slots have become inconsistent and more will probably follow.  These 
inconsistencies become objects stored in a volume but lost to the top level 
cache for open/lookup.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to