> -----Original Message----- > From: Branko Čibej [mailto:br...@xbc.nu] On Behalf Of Branko Cibej > Sent: dinsdag 27 maart 2012 2:56 > To: Subversion Development > Subject: Re: Compressed Pristines (Call for Vote) > > [this should really be on the dev@ list, i mis-addressed my response.] > > On 26.03.2012 17:48, Ashod Nakashian wrote: > >> > >> I recommended a two-stage approach to the implementation because > >> (packing and) encrypting the pristines is only half the story, there > >> are a number of interactions with the rest of the system that need to > >> be ironed out, and it makes no sense to take on the most complex part > >> of the storage format first. > > I see. This isn't what I gathered previously. > > I don't /know/ what interactions there are, but it would certainly help > to find out (by running the test suite with the simplest possible > compressed-pristine implementation, for example). > > >> As others have pointed out, the only cases where compression and > >> packing may cause trouble is working copies with a lot of binary, > >> compressed files (.jars, various image formats, etc.) That's a problem > >> that needs solving regardless of the compression algorithm or pack > >> format. > > This can be handled with a not-compressed flag and storing the data as-is. > > You're jumping a couple steps here, the first two obviously being, a), > identify which file formats should not be compressed, and b) figure out > how to detect such files reasonably reliably. Using svn:mime-type as the > determinator is not the answer, because I can set that to anything I > like regardless of actual file contents. My advice here is to > incorporate the flag in the wc.db, but not do the actual detection and > special handling in the first release. > > > My only issue with the database-as-storage is that it won't work for > > large files and if we store large files directly on disk, then we'll > > have to split the storage, which isn't really great. > > It is, in fact, really great. Filesystems are typically much better at > storing large files than any contrived packed-file format. Databases are > typically much better at storing small files than most filesystems. To > get the most benefit for the least effort, use both. > > > Consider a file just around the cut-off for storing in a database that gets > moved between the database and disk between modifications that results in the > file crossing the cut-off. > > Then don't make the cutoff a hard limit. For example, if a file is > already on disk, don't store it in the database until it shrinks to > below 75% of the limit. Conversely, don't push it out to the filesystem > until it grows to perhaps even 200% of the limit.
A file that is only keyed by its SHA-1 will never grow or shrink. I don't think we have to discuss this part right now. We moved away from using a pristine file per working copy file in 1.7 and I don't think we want to move back to that system. When a file changes it becomes a new file, or we must open a completely new design where we also store just the changes for certain files. Bert