On Wed, Oct 17, 2007 at 09:35:35PM +0200, Markus Schiltknecht wrote: > I've recently been trying to implement some of the ideas from the > database compaction wiki page [1].
Cool! > == Compact Heights == > > As the wiki says, uleb128 wouldn't work too well, because compressed > heights couldn't be compared with memcmp(). As I've had to do a lot with > JPEGs recently, I've stumbled across Huffman encoding. Given a > hand-crafted Huffman Tree, the memcmp() property could be maintained. > But even though that idea fascinated me, and I've already begun to > compile statistics about distribution of height values, I had to realize > that the potential gain is very limited. As an example, my current > working database is 618.1 MiB. Trimming all heights to a single-byte > value - which is way too optimistic - results in a 612.8 MiB file. > Maximum savings: less than 1% :-( Thus overly complicated things like > Huffman are definitely not worth the effort. I didn't write any code here. I'm honestly not sure that we even get any benefit from using memcmp comparison on heights. I think the only way we use heights is to load them into memory, and if data is moving from sqlite into mtn proper than doing any kind of conversion on it is trivial and essentially free anyway -- I only brought up this memcmp comparison thing in the first place in case it let us do something clever with sqlite indices, and AFAIK we aren't. > == Putting heights in the revisions table == > > I'm not sure why exactly heights haven't been stored in the revisions > table from day one on, but I certainly think it's a good idea. It saves > lookups and increases locality in case one needs the heights. And I > don't think it hurts much if you don't. To check confirm or reject these > assumptions, please refer to revision 45501e62. The tricky part about this is the regenerate caches stuff -- revisions are sacrosanct, because they give "ground truth", while heights may get wiped out and regenerated. So the db needs to be able to represent "here are all the revisions, none of them have heights assigned", and we need to be able to detect when the db is in this state. (And the logic around figuring out what state the db is in at any given time is already kind of convoluted :-/). If you've solved that, though, cool, it's probably doable. -- Nathaniel -- "Of course, the entire effort is to put oneself Outside the ordinary range Of what are called statistics." -- Stephan Spender _______________________________________________ Monotone-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/monotone-devel
