Hi Justin,

Sorry for the late reply and thanks for your notes. I should say right off the 
bat that the design doc is outdated in terms of what we plan to do as a first 
implementation, although as a proposal for packed file format I think it's 
still mostly valid, except for a few notes and improvements (such as 64bit 
filesize) that are missing/invalid (regardless of whether or not we'll 
ultimately implement it or not).


See my notes and comments inline please.

>________________________________
> From: Justin Erenkrantz <jus...@erenkrantz.com>
>To: Ashod Nakashian <ashodnakash...@yahoo.com> 
>Cc: Greg Stein <gst...@gmail.com>; "dev@subversion.apache.org" 
><dev@subversion.apache.org> 
>Sent: Friday, April 6, 2012 10:19 AM
>Subject: Re: Compressed Pristines (Summary)
> 
>On Wed, Apr 4, 2012 at 1:28 PM, Ashod Nakashian
><ashodnakash...@yahoo.com> wrote:
>> I feel this is indeed what we're closing on, at least for an initial working 
>> demo. But I'd like to hear more agreements before committing to this path. I 
>> know some did show support for this approach, but it's hard to track them in 
>> the noise.
>>
>> So to make it easier, let's either voice support to this suggestion and 
>> commit to an implementation, or voice objection with at least reasons and 
>> possibly alternative action. Silence is passive agreement, so the onus on 
>> those opposing ;-)
>
>I just read the Google doc - glad to see progress here - a few comments:
>
>First off, if I understand correctly, I do have to say that I'm not at
>all a fan of having a large pristine file spread out across multiple
>on-disk compressed pack files.  I don't think that makes a whole lot
>of sense - I think it'd be simplest (when we hit the heuristic to put
>it on-disk rather than in SQLite) to keep it to just one file.  I
>don't get why we'd want to have a big image pristine file (say a PSD
>file) split out into say 20 smaller files on disk.  Why?  It just
>seems we're going to introduce a lot of complexity for very little
>return.

The straightforward design is to have a single large pack file. But in practice 
this is very problematic. You can already find FSes that may barf on multi-GB 
files, but that aside, consider the overhead of removing prisitine files and 
shifting the bytes following it. The overhead is extreme. To avoid that, we 
need to track holes in the files (and incur the wasted space on disk) and (even 
worse) we need to do heavyweight lifting to accommodate new/modified pristines 
into these holes where they might not fit! In other words, we have to write a 
complex FS in a single file and we have to keep its size on disk small (to 
justify this feature!) and to do housekeeping as fast as possible (shifting GBs 
on disk because we have a largish hole at the beginning of the file has a cost).

My solution is to split the pack files such that each file is small enough to 
fit in memory and be written to disk in sub-second times. This way, 1) holes in 
these files can be avoided completely and swiftly, 2) even if we keep holes, 
they shouldn't/couldn't be too large.


>  The whole point of stashing the small files directly into
>SQLite's pristine.db is to make the small files SQLite's problem and
>not the on-disk FS (and reduce sub-block issues) - with that in place,
>I think we're not going to need to throw multiple files into the same
>pack file.  It'll just get too confusing, IMO, to keep track of which
>file offsets to use.  (For a large file that already hits the size
>trigger, we know that - worst case scenario - we might lose one FS
>block.  Yawn.)  We can make the whole strategy simpler if we follow
>that.

I'm a bit confused here. You're assuming that we'll use Sqlite for small files 
and the FS for larger ones, I'm assuming. However that's not in the proposal, 
it's what we've agreed on on this list. We aren't going to implement both, not 
for now at least. What we're going to do is simply push small pristines into 
pristine.db and in-place compress the larger ones on disk (as a first 
implementation we'll probably even leave the names the same and change nothing 
other than passing the disk I/O through compressed streams). Beyond that, we 
will probably experiment with packing. But it's a bit soon to worry about it. 
Although any research or help is more than welcome!


>
>I'm with Greg in thinking that we don't need the pack index files -
>but, I think I'll go further and reiterate that I think that there
>should just be a 1:1 correspondence between the pack file and the
>pristine file.  What's the real advantage of having multiple large
>pristines in one pack file (and that we constantly *append* to)?  And,
>with append FS ops with multiple files in one pack file, we rely on
>our WC/FS locking strategy to be 100% perfect or we have a hosed pack
>file.  Ugh.  I think it just adds unnecessary complexity.  I think
>it'll be far simpler to have 1:1 correspondence with a pack file to a
>single large pristine.  We'll have enough complexity already to just
>find the small files sitting in SQLite rather than on-disk.

It's all relative! Saying "multiple large pristines in one pack file" assumes 
too much. I find it better to first define/find/compute an order of pack file 
size that satisfies our reqs (my crude math finds that to be in the order of a 
few MBs - see proposal doc) then it follows automatically the largest pristine 
that can share a pack file with another. Anything smaller can share a pack file 
with others, and hence (by our definition!) aren't "too large". Larger ones are 
"really large" (again by our definition) and so will be compressed alone on 
disk (ignoring if there will be a pack-header or not is hopefully not debated 
for now) - practically these files will be in-place compressed, as a result.

As for assuming that we only/constantly append to a pack file, that's 
unfounded. Files may be removed from a pristine store upon svn up. Even if not, 
a file modification is reasonably implemented as remove+add. This is the 
correct way to do it because the files size might change and we need to do the 
same housekeeping as removing one file and adding an unrelated one. Granted, 
there is room for improving this. In other words, knowing it's the same 
prisitine file modified a bit doesn't give us much information to be of 
practical use.


>
>Given that 1:1 correspondence, I do think that the original file-size
>and the complete checksum should be stored in the custom pack file
>on-disk.  It'll make it so that we could easily validate whether the
>pack file is corrupt or not by using file-size (as a first-order
>check) and checksum (as second-order).  The thinking here is that if
>the checksum is not in the file contents, but only in the file name
>(or the pristine.db), the file system could very easily lose the
>filename (hello ext3/4!) - this would allow us to verify the integrity
>of the pack file and reassociate it if it gets dis-associated.  This
>is less of an issue with the client as it can always refetch - but, if
>the server code ends up using the same on-disk format (as hinted in
>the Google Doc)...then, I think this is important to have in the file
>format from the beginning.
>
>I definitely think that we should store the full 64-bit length
>svn_filesize_t and not be cute and assume no one has a file larger
>than 1TB.

All welcome notes. We will get back to these issues when we have a working 
version that we can play and experiment with. There is certainly too many 
things to worry about and perhaps even more tempting points to toy with. To be 
pragmatic (and productive) I want to focus on getting the simplest working 
implementation that can justify this feature (i.e. one that does produce real 
disk savings without too much complexity or performance reduction). But points 
taken.


>
>I'll disagree with Greg for now and say that it's probably best to
>just pack everything and not try to be cute about not packing certain
>file types - I think that's a premature optimization right now.  I
>think the complexity of having a mixed pristine collection with some
>files packed and some files unpacked is odd (and some files in SQLite
>and some files on-disk).  Maybe end up adding a no-op compression type
>to the file format (IOW, tell gzip to do a no-op inflate via
>Z_NO_COMPRESSION).  Maybe.  I just doubt it's worth the additional
>complexity though.  ("Is this pristine file compressed?"  "I don't
>know."  Argh!)  Making those assumptions based on file extensions or
>even magic bits can be a bit awkward - case in point is PDF...some
>times it'll compress well, some times it won't.  So, best off just
>always compressing it.  =)

Agreed. I think it's reasonable to attempt at a no-compression type, but keep 
it abstracted away in the compression layer, not higher. I also agree it's a 
premature optimization, so we should do it when we have a working stack first.


-Ash


>
>My $.02.  -- justin
>
>
>

Reply via email to