Sounds fairly interesting, though I'm curious how it compares;
my guess was that because of the lack of locality due to using
hashes for the score, a trie wouldn't be that different from a
hash table.

Quoth [email protected]:
> After studying Steve Stallion's  SSD venti disaster, I decided to do my own 
> try to fix the issues of venti.
> 
> Despite my reservations on the lasting wisdom of some of the design choices, 
> I try to use the traditional  arena disk layout.
>  Only the on-disk index is replaced with a trie-based in-memory structure.
> 
> The trienodes represent either the score and IAddr data as leaves or 16 
> indices for the next nibble of the score to search further. There is no need 
> for a Bloom filter, as the trie search is not less performant for negative 
> results. The actual trienode size is 64 bytes now, but can probably shorted 
> to 48 bytes.
> 
> So far, I have managed to convert buildindex into buildtrie.  If -v option is 
> used, the contents of the trie are printed in lexical order of the score.
> 
> The data from my experiments are:
> 
> I used my 4 arena files, each 20GB, containing about 10 million clumps in 
> standard 500MB arenas. Data from the arena directories are read in in about  
> one and a half minute. (There is one error in one of the arenas.) IMHO this 
> is acceptable as startup time for a venti server.
> 
> The trie has about 14m nodes, which are stored in a contiguous array. The 
> trie, which is now 32 bit indexed, thus may be reduced to 24 bit index for 
> the current data amount.
> 
> For larger storage, there is a design choice, either use 24 bit indices and 
> 48 byte trie nodes, and 256 trie arrays, or use 32bit indices and 64 byte 
> trienodes in a single array.
> 
> After I  manage to  push my data to a planport fork on github, you will hear 
> more.

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T21878aa53884911b-M7cf5960cda854dba36823793
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to