[ 
https://issues.apache.org/jira/browse/KUDU-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088250#comment-16088250
 ] 

Andrew Wong commented on KUDU-1960:
-----------------------------------

I spent some time messing with CORDS. They approached this by mounting these 
projects on a filesystem that could inject various errors and corruptions to a 
single block. While this may be useful, I don't think we need this for testing 
disk failures. A few reasons for this:
* The disk failure injection in some of my (in-review) tests is already quite 
similar to the injection here, but at a file-level instead of a block level
* Their tests equate to writing up a bunch of test scenarios and examining the 
outputs (via grep, sed, etc.) when running with faults injected. We should be 
able to achieve this ourselves with integration tests
* The end-goal of running with CORDS is to see what happens when various disk 
blocks fail I/O; we should already (and in the future) have a solid 
understanding of what happens when these failures happen (and we should be 
testing for them ourselves)
* May be nitpick-y, but their code for injection is poorly commented and may be 
a hassle to maintain if we were to, say, run this as a recurring job. It's not 
particularly hard to read through but it might be annoying debugging errors 
from their end.
* Another thing I thought was a bit odd was not all of their code is available 
for other systems (e.g. they only posted their code for ZooKeeper and not 
Kafka, MongoDB, Redis, etc.)
That isn't to say that I don't think there is value in running CORDS or 
similar. It would be a nice check to make sure we're doing the right thing. 
However, the major thought I had from this is that we can do a lot of this 
testing ourselves (and perhaps even improve our testing/debugging 
infrastructure for more end-to-end tests).

> Run CORDS or similar tests on Kudu
> ----------------------------------
>
>                 Key: KUDU-1960
>                 URL: https://issues.apache.org/jira/browse/KUDU-1960
>             Project: Kudu
>          Issue Type: Task
>          Components: test
>            Reporter: Grant Henke
>            Assignee: Andrew Wong
>
> "CORDS is a fault-injection system consisting of errfs, a FUSE file system, 
> and errbench, a set of workloads and a behaviour inference script for each 
> system under test."
> * Overview & link to source code:  
> http://research.cs.wisc.edu/adsl/Software/cords/
> * Whitepaper and presentation: 
> https://www.usenix.org/conference/fast17/technical-sessions/presentation/ganesan
> * Blog: 
> https://blog.acolyer.org/2017/03/08/redundancy-does-not-imply-fault-tolerance-analysis-of-distributed-storage-reactions-to-single-errors-and-corruptions/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to