[ https://issues.apache.org/jira/browse/KUDU-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Berkeley resolved KUDU-531. -------------------------------- Resolution: Won't Do Fix Version/s: n/a I spent a good bit of time working with ALICE and Kudu, trying to get ALICE to develop a logical characterization of Kudu's write path that it could use to test our durability semantics. Here's the significant issues I ran in to: 1. Some syscalls aren't supported. For example, ALICE doesn't support analyzing the ioctl(2) call Kudu uses on XFS to do hole punching. 2. Some syscalls are buggy. fallocate(2) isn't correctly handled. It's easy to work around but unclear if the obvious fixes are sufficient for the behavior to be correct. 3. It doesn't handle memory mapped files. This is a small issue since AFAIK the only memory-mapping we do is with log index files and they don't have durability requirements. 4. It doesn't track sync operations on directories and it's not clear why. These are a critical part of how Kudu makes things durable, but ALICE by default basically just skips them. Hacking them in is easy enough but it's not clear that ALICE will handle it correctly. 5. The project is just generally not mature or maintained. There have only been a few commits ever, and it appears the project is mostly inactive since the paper was published. Given its a complex mix of long python scripts without any tests, it's hard to modify while remaining confident that it's working as desired. 6. I don't think it can handle the extra complications of using multiple devices. This is particularly troublesome for Kudu since we recommend putting the WAL on a separate device from data directories. With enough patching and testing, I think ALICE would be a good tool for checking Kudu's fs update protocols (e.g. able to detect problems like KUDU-2260), but the tool is not nearly there, and I judge it's not worth the effort to get it there at this time. We could take this up in the future, and we should be on the lookout for other frameworks that do something similar and that may be better tested and maintained. > Run ALICE on Kudu > ----------------- > > Key: KUDU-531 > URL: https://issues.apache.org/jira/browse/KUDU-531 > Project: Kudu > Issue Type: Task > Components: test > Affects Versions: Backlog > Reporter: Todd Lipcon > Assignee: Will Berkeley > Priority: Major > Fix For: n/a > > > http://research.cs.wisc.edu/adsl/Software/alice/ is a cool tool which can > test correctness of recovery protocols under various disk faults. We should > run this to verify our ordering of operations doesn't result in any possible > data loss scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005)