I've been avoiding this issue for a while, but it came up while fixing 
the problem Vinicius hit yesterday...

When a single client is accessing a file for read write, or many clients 
are accessing it for read only, the clients can safely cache file size and 
contents.  When there are multiple writers, or a mix of readers and 
writers, that doesn't work.

The basic strategy is for the client to bypass its cache and make all 
reads and writes synchronous, so that the object store effectively 
serializes I/O.  File contents are striped over objects, and each write 
simply translates into a write to the proper extent of the proper object.  
Reads work the same way.  Sparse files are trivially supported: if the 
object for the given extent doesn't exist, it is defined to be a hole 
(zeros).

The problem is dealing with the end of the file.  Because multiple 
clients can be reading and writing to the file in parallel, nobody knows 
what the real file size is.  If we try to read an extent and get a short 
read or -ENOENT back from the object store, it can either mean we've hit a 
hole in the file (and the size is larger), or we've reached the end of the 
file.  In the former case, we need to zero-fill the result buffer, and in 
the latter case, we need to return a short read.  And most importantly, a 
read at EOF needs to return 0.

So.. how do we know we hit EOF?  The client gets notification of 
truncation, so if we're entirely below our last known size, we know it's 
not EOF.  But the rest of the time, we need to get an updated size from 
the MDS, which in turn will ask each client for their size.  Slow, but 
correct.

Alternatively, we could define a new cap that is needed to extend EOF, so 
that only that client needs to be asked (and, similarly, if you are that 
client, you know your size is correct).  Better, but it will mean more 
locking logic in the MDS to dish out the cap appropriately (and logic in 
the client to request it is we try to write past eof).

I think I'm going with the simple non-optimal solution for now.  Unless 
anyone has any other ideas?  :)

sage

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

Reply via email to