I've been avoiding this issue for a while, but it came up while fixing the problem Vinicius hit yesterday...
When a single client is accessing a file for read write, or many clients are accessing it for read only, the clients can safely cache file size and contents. When there are multiple writers, or a mix of readers and writers, that doesn't work. The basic strategy is for the client to bypass its cache and make all reads and writes synchronous, so that the object store effectively serializes I/O. File contents are striped over objects, and each write simply translates into a write to the proper extent of the proper object. Reads work the same way. Sparse files are trivially supported: if the object for the given extent doesn't exist, it is defined to be a hole (zeros). The problem is dealing with the end of the file. Because multiple clients can be reading and writing to the file in parallel, nobody knows what the real file size is. If we try to read an extent and get a short read or -ENOENT back from the object store, it can either mean we've hit a hole in the file (and the size is larger), or we've reached the end of the file. In the former case, we need to zero-fill the result buffer, and in the latter case, we need to return a short read. And most importantly, a read at EOF needs to return 0. So.. how do we know we hit EOF? The client gets notification of truncation, so if we're entirely below our last known size, we know it's not EOF. But the rest of the time, we need to get an updated size from the MDS, which in turn will ask each client for their size. Slow, but correct. Alternatively, we could define a new cap that is needed to extend EOF, so that only that client needs to be asked (and, similarly, if you are that client, you know your size is correct). Better, but it will mean more locking logic in the MDS to dish out the cap appropriately (and logic in the client to request it is we try to write past eof). I think I'm going with the simple non-optimal solution for now. Unless anyone has any other ideas? :) sage ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel