On Mon, Aug 31, 2009 at 04:50:02PM -0400, Paul Nowoczynski wrote: > Yes this is the case on server failure but I think the true similarity > between lustre and a locally mounted filesystem lies in the failure of a > client holding dirty pages. Please correct me if I'm wrong but data > loss will occur should the client fail after close() but prior to the > set of dirty pages being committed on the OST.
The client will have DLM locks outstanding if it has dirty data, so that the client's death can be used to detect that its open, dirty files are now potentially corrupted. Client death with dirty data is not all that different from process death with dirty data in user-land. Think of an application that does write(2), write(2), close(2), _exit(2), but dies between writes. Compare that to a client that dies after flushing the first of those writes but before flushing the second, though after the application calls close(2). Nothing special is usually done in the first case, even though if the process did have byte range locks outstanding, then the OS could flag the affected file as potentially corrupted. I don't think Lustre does actually do anything to mark files as corrupted that Lustre could detect as potentially corrupted. Some applications can recover automatically -- think of databases, such as SQLite3, or think of plain log files. Other applications might well be affected. Since corruption detection in this case is heuristic, and since the impact will vary by application, I don't think there's an easy answer as to what Lustre ought to do about it. Ideally we could track the "potentially corrupt" status as an advisory meta-data item that could be fetched with a stat(2)-like system call, and have applications reset it when they recover. Nico -- _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
