Hello, In a distributed system, it is very important to know which node is more healthier than others to make a request. Or of course, when to determine one node should be treated as dead.
For example, cassandra relies on phi accrual detector[1] to detect node down. A node does a gossip communication with 3 nodes every second, and exchanges information with each other. And its response time is used as an input for the failure detection. Also, a badness score is computed with such information, and which is used to choose a healthier node among replica nodes. But, I have seen many situations when it didn't work as expected, especially choosing a healthier node. On the other hand, I know any service provider makes some kind of health check request to detect if service is available or not. It may be just a simple ping, or HEAD request. Then, I just wondered if it is a good use case to use HTM for failure detection with such simple health check requests? For example, its input looks like this: time, node, avg response time(ms) 10:00:00, node1, 10 10:00:00, node2, 9 ... 10:00:30, node1, 15 10:00:30, node2, 10 ... [1] http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf Thanks, Takenori
