Github user gtenev commented on the issue:
    @jpeach, appreciate your feedback!
    It felt that "disk being offline" (might be an operator's decision) and 
"disk being bad" (number of IO errors reached a threshold) are better kept 
separate in general.
    IMHO using `CacheDisk::num_errors` to mark the disk offline could be error 
prone and here is an example.
    Let us say ``proxy.config.cache.max_disk_errors=5`` and a disk keeps 
failing causing ``handle_disk_failure()`` to be called and at some point 
``CacheDisk::num_errors`` becomes ``5``  which causes 
``mark_storage_offline()`` to be called. 
    At this point since ``CacheDisk::num_errors=5`` then ``true==DISK_BAD(d)``.
    It seems that if I did ``if(!DISK_BAD(d)) {...}`` (as suggested above) it 
would not execute the code in ``mark_storage_offline()`` at all, for instance 
``proxy.process.cache.bytes_total_stat`` would not get updated as it should.
    This is one of my first adventures in the "cache"component so I hope I am 
not missing something, please let me know what you think and will gladly 
look/test/change as necessary. 

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

Reply via email to