Re: Compressed Pristines (Call for Vote)

Branko Čibej Mon, 26 Mar 2012 17:56:15 -0700

[this should really be on the dev@ list, i mis-addressed my response.]

On 26.03.2012 17:48, Ashod Nakashian wrote:
>>
>> I recommended a two-stage approach to the implementation because
>> (packing and) encrypting the pristines is only half the story, there
>> are a number of interactions with the rest of the system that need to
>> be ironed out, and it makes no sense to take on the most complex part
>> of the storage format first.
> I see. This isn't what I gathered previously.


I don't /know/ what interactions there are, but it would certainly help
to find out (by running the test suite with the simplest possible
compressed-pristine implementation, for example).

>> As others have pointed out, the only cases where compression and
>> packing may cause trouble is working copies with a lot of binary,
>> compressed files (.jars, various image formats, etc.) That's a problem
>> that needs solving regardless of the compression algorithm or pack
>> format.
> This can be handled with a not-compressed flag and storing the data as-is.

You're jumping a couple steps here, the first two obviously being, a),
identify which file formats should not be compressed, and b) figure out
how to detect such files reasonably reliably. Using svn:mime-type as the
determinator is not the answer, because I can set that to anything I
like regardless of actual file contents. My advice here is to
incorporate the flag in the wc.db, but not do the actual detection and
special handling in the first release.

> My only issue with the database-as-storage is that it won't work for
> large files and if we store large files directly on disk, then we'll
> have to split the storage, which isn't really great.

It is, in fact, really great. Filesystems are typically much better at
storing large files than any contrived packed-file format. Databases are
typically much better at storing small files than most filesystems. To
get the most benefit for the least effort, use both.

>  Consider a file just around the cut-off for storing in a database that gets 
> moved between the database and disk between modifications that results in the 
> file crossing the cut-off.

Then don't make the cutoff a hard limit. For example, if a file is
already on disk, don't store it in the database until it shrinks to
below 75% of the limit. Conversely, don't push it out to the filesystem
until it grows to perhaps even 200% of the limit.

>  It's not as clean as having a file-format that handles all cases for us. 
> Sure it'll take longer to get release-quality code, but

But?

>  it'd be the correct way of doing things,

Oh nonsense. "Correct" is what works best in a given situation and is
most maintainable, not what generates the best research papers.

>  won't feel hackish and will serve us in the long-run better.

I dislike hand-waving arguments, so you'll have to substantiate this one
about serving us better in the long run to convince me. All I see here
is an acute case of NIH syndrome.

>  My only similar solid example is Git. I don't know how much effort went into 
> their system, but I'm sure we can do this and do it right. If I were to 
> choose between 6 months to release without packed-files and 12-15 months with 
> them, I'd choose the latter. At least that's how I see it.

The point is that you can have a working, stable implementation using
off-the-shelf code (filesystem and database) in a few weeks, and it does
not stop you from going on to inventing a packed-file format that is
nevertheless friendly to deletions.

(Just for the record though: If you can do that and make it perform
better than SQLite in less than a year, I'll eat my hat.)

-- Brane

Re: Compressed Pristines (Call for Vote)

Reply via email to