Sounds fairly interesting, though I'm curious how it compares; my guess was that because of the lack of locality due to using hashes for the score, a trie wouldn't be that different from a hash table.
Quoth [email protected]: > After studying Steve Stallion's SSD venti disaster, I decided to do my own > try to fix the issues of venti. > > Despite my reservations on the lasting wisdom of some of the design choices, > I try to use the traditional arena disk layout. > Only the on-disk index is replaced with a trie-based in-memory structure. > > The trienodes represent either the score and IAddr data as leaves or 16 > indices for the next nibble of the score to search further. There is no need > for a Bloom filter, as the trie search is not less performant for negative > results. The actual trienode size is 64 bytes now, but can probably shorted > to 48 bytes. > > So far, I have managed to convert buildindex into buildtrie. If -v option is > used, the contents of the trie are printed in lexical order of the score. > > The data from my experiments are: > > I used my 4 arena files, each 20GB, containing about 10 million clumps in > standard 500MB arenas. Data from the arena directories are read in in about > one and a half minute. (There is one error in one of the arenas.) IMHO this > is acceptable as startup time for a venti server. > > The trie has about 14m nodes, which are stored in a contiguous array. The > trie, which is now 32 bit indexed, thus may be reduced to 24 bit index for > the current data amount. > > For larger storage, there is a design choice, either use 24 bit indices and > 48 byte trie nodes, and 256 trie arrays, or use 32bit indices and 64 byte > trienodes in a single array. > > After I manage to push my data to a planport fork on github, you will hear > more. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T21878aa53884911b-M7cf5960cda854dba36823793 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
