On Wed, Apr 4, 2012 at 1:28 PM, Ashod Nakashian <ashodnakash...@yahoo.com> wrote: > I feel this is indeed what we're closing on, at least for an initial working > demo. But I'd like to hear more agreements before committing to this path. I > know some did show support for this approach, but it's hard to track them in > the noise. > > So to make it easier, let's either voice support to this suggestion and > commit to an implementation, or voice objection with at least reasons and > possibly alternative action. Silence is passive agreement, so the onus on > those opposing ;-)
I just read the Google doc - glad to see progress here - a few comments: First off, if I understand correctly, I do have to say that I'm not at all a fan of having a large pristine file spread out across multiple on-disk compressed pack files. I don't think that makes a whole lot of sense - I think it'd be simplest (when we hit the heuristic to put it on-disk rather than in SQLite) to keep it to just one file. I don't get why we'd want to have a big image pristine file (say a PSD file) split out into say 20 smaller files on disk. Why? It just seems we're going to introduce a lot of complexity for very little return. The whole point of stashing the small files directly into SQLite's pristine.db is to make the small files SQLite's problem and not the on-disk FS (and reduce sub-block issues) - with that in place, I think we're not going to need to throw multiple files into the same pack file. It'll just get too confusing, IMO, to keep track of which file offsets to use. (For a large file that already hits the size trigger, we know that - worst case scenario - we might lose one FS block. Yawn.) We can make the whole strategy simpler if we follow that. I'm with Greg in thinking that we don't need the pack index files - but, I think I'll go further and reiterate that I think that there should just be a 1:1 correspondence between the pack file and the pristine file. What's the real advantage of having multiple large pristines in one pack file (and that we constantly *append* to)? And, with append FS ops with multiple files in one pack file, we rely on our WC/FS locking strategy to be 100% perfect or we have a hosed pack file. Ugh. I think it just adds unnecessary complexity. I think it'll be far simpler to have 1:1 correspondence with a pack file to a single large pristine. We'll have enough complexity already to just find the small files sitting in SQLite rather than on-disk. Given that 1:1 correspondence, I do think that the original file-size and the complete checksum should be stored in the custom pack file on-disk. It'll make it so that we could easily validate whether the pack file is corrupt or not by using file-size (as a first-order check) and checksum (as second-order). The thinking here is that if the checksum is not in the file contents, but only in the file name (or the pristine.db), the file system could very easily lose the filename (hello ext3/4!) - this would allow us to verify the integrity of the pack file and reassociate it if it gets dis-associated. This is less of an issue with the client as it can always refetch - but, if the server code ends up using the same on-disk format (as hinted in the Google Doc)...then, I think this is important to have in the file format from the beginning. I definitely think that we should store the full 64-bit length svn_filesize_t and not be cute and assume no one has a file larger than 1TB. I'll disagree with Greg for now and say that it's probably best to just pack everything and not try to be cute about not packing certain file types - I think that's a premature optimization right now. I think the complexity of having a mixed pristine collection with some files packed and some files unpacked is odd (and some files in SQLite and some files on-disk). Maybe end up adding a no-op compression type to the file format (IOW, tell gzip to do a no-op inflate via Z_NO_COMPRESSION). Maybe. I just doubt it's worth the additional complexity though. ("Is this pristine file compressed?" "I don't know." Argh!) Making those assumptions based on file extensions or even magic bits can be a bit awkward - case in point is PDF...some times it'll compress well, some times it won't. So, best off just always compressing it. =) My $.02. -- justin