Stefan Sperling wrote on Mon, Aug 29, 2011 at 18:01:38 +0200: > On Sat, Aug 27, 2011 at 02:15:31AM +0300, Daniel Shahaf wrote: > > > - cache can be regenerated on demand > > > > Regenerated *offline*, right? ie, if the cache is lost it can be > > derived from the revision files (which remain authoritative), but this > > process need not withstand concurrent readers. (The problem being the > > need to decrement the cached-rev file) > > Well, decrementing the cached-rev file would have to be done offline. > But the cache can be rebuilt while the repository is active. > > We could have a flag that says whether the cache is active. > E.g. after an upgrade from an older repository format, the cache > would be flagged inactive, until it has been populated. > > When rebuilding the cache we'd take the repository offline, > mark the cache inactive, remove the cached-rev file, go online, > and rebuild the cache. Readers will get information for older > revisions only until the cache has been rebuilt. >
We mentioned a format number for the cache files; this "active flag" almost smells like a format number for the entire cache... (with formatĀ 0 being "inactive") > > > The index size and offsets are 64 bit values. > > > > Stored as? ASCII decimal? (If so: how is it delimited or padded?) Big > > endian binary? > > Some binary format. Fixed size. > Fair enough. > > Q: How do you update the 'offset of free space' in a manner that is > > consistent to readers? > > > > ie, if the offset was ASCII ' 800' and the new offset is '1600', you > > must avoid a reader accidentally seeing '1800'. > > The offsets appear at the start of a disk block. > They should be a fixed-sized binary value -- ASCII has problems > as you've pointed out. > > In any case, I am assuming that flushing data to disk flushes one disk > block. I.e. we can update one block atomically during flush. > Maybe this assumption isn't valid. > ack > > (I seem to remember some ideas exchanged on the revprop-packing design > > discussion --- including one that involved having two offsets written > > down in the file, and one of them *always* being valid (but not always > > the same one) --- perhaps Peter will remember...) > > Do you have a link to the archives? > Not exactly, sorry. Most discussions happened in this (huge) thread: From: kmra...@rockwellcollins.com To: dev@subversion.apache.org Subject: Does fsfs revprop packing no longer allow usage of traditional backup software? Message-ID: <off5dc1071.793efd7c-on862578bf.005304da-862578bf.00552...@rockwellcollins.com> The crux of the idea, though, is to have two byte locations for the offset --- say, store an integer at byte 4-7 and another at bytes 8-11 --- and have some logic to choose which one of those to read and treat as the offset. In the meantime the other integer can be updated arbitrarily. > > So, invariant: given a node-rev, all its s-blocks, except perhaps the > > last (in order of appearance in the file), have at most <size of the > > longest possible node-rev-id, minus one> free (zero) bytes at the end. > > Did you mean "at least"? > No. I said that "All completed s-blocks have at most O(1) free bytes at the end". > > > It updates the 'next s-block' offset in the previous s-block to the > > > offset of the new s-block, and flushes the cache file to disk. > > > > > > > At what point does the writer update the cached-rev file? > > The writer doesn't update the cached-rev file. This is done by > the global cache sync code. > > I forgot to include two steps for the writer, though. > It needs to update the offset to the last s-block in the index > You forgot to include "two steps", but you only list one step here. Did you forgot to list the other step which you had originally forgotten to list? > > So, a lost block is in the file. You could avoid that by seeking, not > > to the end of the file but to > > > > SELECT (MAX(offset to my last s-block) + SIZEOF(s-block)) FROM index > > > > (and, yes, I realize that this is a crash situation. two answers to > > that: first, it's easy to avoid having an unused block; second, if > > a file grows an unused block then a DOS can translate to running out of > > disk space.) > > I don't really understand you here. > Are you suggesting that the code should check for a dead s-block > first, before appending to the file? > I'd like to avoid growing 'holes' in the middle of the file that readers will never access. (I don't want to require dump|load in order to lose such holes.) > That MAX doesn't make sense. Each node_rev_id only appears once > in the index. Maybe that wasn't clear yet? There is exacrty one entry > for each node-rev-id from revision N in the cache file for revision N. > Clear. > > > for node_rev in $rev { > > > obtain predecessor p_node_rev of $node_rev > > > obtain revision p_rev from $p_node_rev > > > > Should the cache note which of the successors is the M-successor (the > > one with the same copy-id)? Perhaps by stipulating that the M-successor > > is always the first in the list of successors, and leaving room for it > > if a C-successor (a copy) is created before the M-successor. > > We don't know how long the M-successor ID is going to be. > Actually we can bound it, but the bound I have in mind is going to be a large overestimate for the common case. > If we want to optimise locating the M-successor, we could add the > offset to the s-block which contains the M-successor to the index, > and use a special flag byte to terminate the M-successor string. > This is probably going to be more space-efficient, yes. > > (It does mean we have two codepaths though, eg both packed and > > non-packed shards; and <cue wild idea> perhaps it'll be easier to have > > *only* packed shards, and to use the expensive replay method to answer > > queries about revisions younger than the latest pack file? (We may want > > to use a smaller shard size in this case.) > > > > And, by the way, this idea isn't /so/ unwieldy. I think that you can > > use *only* pack files --- and also use pack files for unfinished shards > > --- if, when you run out of INDEX_SIZE, you rewrite the whole file and > > move-into-place a new pack file (with a doubled INDEX_SIZE) on top of > > the old pack file. The advantage here would be not having the > > both-packed-and-nonpacked-shards headache to deal with.) > > Might be possible. Not sure if that's worth it. > > > > write $rev into 'cached-rev' file > > > flush 'cached-rev' file > > > > No, you don't flush, you move-into-place. > > I decided to avoid move-into-place on purpose. > > This is my biggest question: > > Does FSFS presently move stuff into place if the move target location > already exists? I am asking because I don't think this would work on > Windows. We cannot overwrite a file if a reader has opened it. > Or do we really use the delete-retry-loop on Windows servers? > > The only situation I can think of is where a previous commit created > a revision file and didn't update 'current'. In which case it's unlikely > that the revision file will be open when the next commits moves a file > with the same name into place. > As I said on IRC: look into revprop editing in 1.6 and 1.7 FSFS. (and probably the locks/ hierarchy, too; see fsfs/lock.c) > > Okay, so the cache reading procedure MUST change, to account for the > > non-packed cache file vanishing (by a concurrent packer) and the pack > > file appearing instead. This is another layer of headache on top of the > > base one :-( > > Do we support packing while the repository is live? Yes. > I haden't considered this. > > Thanks for your feedback! Welcome.