Github user gtenev commented on the issue:
https://github.com/apache/trafficserver/pull/1028
@jpeach, appreciate your feedback!
It felt that "disk being offline" (might be an operator's decision) and
"disk being bad" (number of IO errors reached a threshold) are better kept
separate in general.
IMHO using `CacheDisk::num_errors` to mark the disk offline could be error
prone and here is an example.
Let us say ``proxy.config.cache.max_disk_errors=5`` and a disk keeps
failing causing ``handle_disk_failure()`` to be called and at some point
``CacheDisk::num_errors`` becomes ``5`` which causes
``mark_storage_offline()`` to be called.
At this point since ``CacheDisk::num_errors=5`` then ``true==DISK_BAD(d)``.
It seems that if I did ``if(!DISK_BAD(d)) {...}`` (as suggested above) it
would not execute the code in ``mark_storage_offline()`` at all, for instance
``proxy.process.cache.bytes_total_stat`` would not get updated as it should.
This is one of my first adventures in the "cache"component so I hope I am
not missing something, please let me know what you think and will gladly
look/test/change as necessary.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---