[
https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164579#comment-13164579
]
John Plevyak commented on TS-949:
---------------------------------
I admit I haven't looked at this code in a long time, but isn't the vol->len
the length of the volume? What we want is a hash function H(h) which
distributes the key proportional to the size of the volumes. Let's say we have
disk A: 1TB disk B: 200GB and disk C: 500GB. With your changes they would all
get he same number of keys, so that disk B would quickly fill up and start to
loose documents while disk A was still mostly empty.
In order to handle this, the random numbers need to be scaled down so that they
allocate the right proportions. This will not cause the earlier problem
because it preserves the pairwise order between any two disks, that is if B
drops but A and C are still present and A won the first time it will win again
(because the random values will be scale with the same proportion and if x > y
then x * C > y * C for all 0 <= C).
That said, the proportionality multiplier I was using wasn't right. I'll send
out a new patch with the right multiplier. Size needs to be
proportional to (total*total)/size rather than just size.
Thanx for the feedback, please check out the new patch and let's work out some
examples to make sure it does what we want.
The new multiplier should allocate proportional to the size of each vol and not
have any inconsistencies.
> key->volume hash table is not consistent when a disk is marked as bad or
> removed due to failure
> -----------------------------------------------------------------------------------------------
>
> Key: TS-949
> URL: https://issues.apache.org/jira/browse/TS-949
> Project: Traffic Server
> Issue Type: Bug
> Components: Cache
> Affects Versions: 3.1.0
> Environment: Multi-volume cache with apparently faulty drives
> Reporter: B Wyatt
> Assignee: John Plevyak
> Fix For: 3.1.2
>
> Attachments: TS-949-jp-1.patch, TS949-BW-p1.patch
>
>
> The method for resolving collisions when distributing hash-table space to
> volumes for the object_key->volume hash table creates inconsistency when a
> disk is determined to be bad, or when a failed disk is removed from the
> volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts"
> a random index in the hash table until the hash space is exhausted. The
> random order in which a given volume drafts hash table slots is consistent
> across reboot/crash/disk-failure, however when a volume attempts to draft a
> slot which has already been occupied, it skips to its next random pick and
> attempts to draft that slot until it finds an open slot. This ensures that
> the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the
> order that the volumes draft. When rebuilding the hash after disk failure or
> reboot with fewer drives, a volume may secure an index that was previously
> occupied by the dead-disk. In the old hash, the surviving volume would have
> selected another random index due to contention. If this index is taken, by
> the next draft round it will represent an inconsistent key->volume result.
> The effects of one inconsistency will then cascade as whichever volume
> occupies that index after removing a dead disk is now behind on its draft
> sequence as well.
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow. These
> inconsistencies become objects stored in a volume but lost to the top level
> cache for open/lookup.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira