[this should really be on the dev@ list, i mis-addressed my response.] On 26.03.2012 17:48, Ashod Nakashian wrote: >> >> I recommended a two-stage approach to the implementation because >> (packing and) encrypting the pristines is only half the story, there >> are a number of interactions with the rest of the system that need to >> be ironed out, and it makes no sense to take on the most complex part >> of the storage format first. > I see. This isn't what I gathered previously.
I don't /know/ what interactions there are, but it would certainly help to find out (by running the test suite with the simplest possible compressed-pristine implementation, for example). >> As others have pointed out, the only cases where compression and >> packing may cause trouble is working copies with a lot of binary, >> compressed files (.jars, various image formats, etc.) That's a problem >> that needs solving regardless of the compression algorithm or pack >> format. > This can be handled with a not-compressed flag and storing the data as-is. You're jumping a couple steps here, the first two obviously being, a), identify which file formats should not be compressed, and b) figure out how to detect such files reasonably reliably. Using svn:mime-type as the determinator is not the answer, because I can set that to anything I like regardless of actual file contents. My advice here is to incorporate the flag in the wc.db, but not do the actual detection and special handling in the first release. > My only issue with the database-as-storage is that it won't work for > large files and if we store large files directly on disk, then we'll > have to split the storage, which isn't really great. It is, in fact, really great. Filesystems are typically much better at storing large files than any contrived packed-file format. Databases are typically much better at storing small files than most filesystems. To get the most benefit for the least effort, use both. > Consider a file just around the cut-off for storing in a database that gets > moved between the database and disk between modifications that results in the > file crossing the cut-off. Then don't make the cutoff a hard limit. For example, if a file is already on disk, don't store it in the database until it shrinks to below 75% of the limit. Conversely, don't push it out to the filesystem until it grows to perhaps even 200% of the limit. > It's not as clean as having a file-format that handles all cases for us. > Sure it'll take longer to get release-quality code, but But? > it'd be the correct way of doing things, Oh nonsense. "Correct" is what works best in a given situation and is most maintainable, not what generates the best research papers. > won't feel hackish and will serve us in the long-run better. I dislike hand-waving arguments, so you'll have to substantiate this one about serving us better in the long run to convince me. All I see here is an acute case of NIH syndrome. > My only similar solid example is Git. I don't know how much effort went into > their system, but I'm sure we can do this and do it right. If I were to > choose between 6 months to release without packed-files and 12-15 months with > them, I'd choose the latter. At least that's how I see it. The point is that you can have a working, stable implementation using off-the-shelf code (filesystem and database) in a few weeks, and it does not stop you from going on to inventing a packed-file format that is nevertheless friendly to deletions. (Just for the record though: If you can do that and make it perform better than SQLite in less than a year, I'll eat my hat.) -- Brane