AW: Compressed Pristines (Summary)

Markus Schaber Mon, 02 Apr 2012 02:31:16 -0700

Hi, Ashod,

First, thanks for your great summary. I'll throw in just my 2 cents below.


> Von: Ashod Nakashian [mailto:ashodnakash...@yahoo.com]
 
> Pristine files currently incur almost 100%[2] overhead both in terms of
> disk footprint and file count in a given WC. Since pristine files is a
> design element of SVN, reducing their inherent overhead should be a
> welcome improvement to SVN from a user's perspective. Due to the nature of
> source files that tend to be small, the footprint of a pristine store (PS)
> is larger on disk than the actual total bytes because of internal
> fragmentation (file-system block-size rounding waste) - see references for
> numbers.

Was any of those tests actually executed on a file system supporting something 
like "block suballocation", "tail merging" or "tail packing"?

Today, I was rather surprised that my pristine subdir of one of our main 
projects which contains 726 MB of data has an actual disk size of 759 MB, which 
leads to an overhead of less than 4% due to block-size rounding. (According to 
the Explorer "Properties" dialog of Win 7 on a NTFS file system.)

AFAICS, "modern" file systems increasingly support that kind of feature[1], so 
we should at least think about how much effort we want to throw at the 
"packing" part of the problem if it's likely to vanish (or, at least, being 
drastically reduced) in the future. My concern is that storing small pristines 
in their own SQLite database will also bring some overhead that may be in the 
same magnitude of 4%, due to SQLite Metadata, the necessary primary key column, 
and indexing.

Additionally, the simple and efficient way of storing the pristines in a SQLite 
database (one blob per file) also prevents us from exploiting inter-file 
redundancies during compression, while adding a packing layer on top of sqlite 
leads to both high complexity and a large average blob size, and large blobs 
are probably more efficiently handled by the FS directly.

To cut it short: I'll "take" whatever solution emerges, but my gut feeling 
tells me that we should use plain files as containers, instead of using sqlite.

The other aspects (grouping similar files into the same container before 
compression, applying a size limit for containers, and storing uncompressible 
files in uncompressed containers) are fine as discussed.

I'll try to run some statistics using publicly available projects on an NTFS 
file system, just for comparision.

Best regards

Markus Schaber

[1]: 
http://msdn.microsoft.com/en-us/library/windows/desktop/ee681827%28v=vs.85%29.aspx
 claims tail packing support for NTFS. 
http://en.wikipedia.org/wiki/Block_suballocation claims support for BtrFS, 
ReiserFS, Reiser4, FreeBSD UFS2. And AFAIR, XFS has a similar feature. Sadly, 
Ext[2,3,4] are not on that list yet, but rumors claim that Ext4 is to be 
replaced by BtrFS in the long run.

-- 
___________________________
We software Automation.

3S-Smart Software Solutions GmbH
Markus Schaber | Developer
Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax 
+49-831-54031-50

Email: m.scha...@3s-software.com | Web: http://www.3s-software.com 
CoDeSys internet forum: http://forum.3s-software.com
Download CoDeSys sample projects: 
http://www.3s-software.com/index.shtml?sample_projects

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

AW: Compressed Pristines (Summary)

Reply via email to