On Fri, 04 Aug 2000, you wrote: <> > No, the reason why I want to use a binary tree is to do high speed > file index lookups. The reason why is that you don't have to compare > the key you are looking for with each other key for other files until > you come upon the file the key you have points to. And also, this > would greatly reduce the time spent in the worst case (which would be > searching for a nonexistant item with a plain sequential search).
Ok, Travis, what do you think we are? A bunch of gradeschool kids who found the "implement new technology wizard" in a warezed copy of MS Programmer 2005? Of course nobody is doing any linear lookups for references or data from the key value. Even if Hashtables (which are better for this job then binary trees) had not been available in the java standard libraries, they are not exactly rocket science to implement. The problem is that you are completely misundertsanding the difficult part of designing a datastore. This is not an old "bigger, faster, better" (zzz) problem, it is a design problem that nobody has solved yet. The easy part of the DataStore is the lookups. Each key (20 bytes) entry needs to have an address (4 bytes) of another node and a reference to where the actual data is stored (not sure, but to be safe say 16 bytes). That is 40 bytes per entry. In other words, an index for a currently standard node holding 500 entries is 20 kB. An index for an extreme node holding a million entries is only 40 MB, still small enough to keep in the memory, and certainly not an issue on disk. Your half gigabyte index file that "will happen" contains 125 million entries. I would like to contend that that will _not_ happen, and that is not because I think "Freenet never get that big", but for that same reason you don't see that many ADT tree structures with 125 million branches per level. So if you want to look up entries in that list fast, all you have to do is write a hashtable (if you want it in memory - which I would recommend) or a b-tree (if you want it on disk). Then you can store the actual data in a database or on disk or wherever. However, that is NOT the problem with DataStore design. Looking up data in a list is probably the oldest problem in CS and we could probably find a million examples of people having done it before. The hard part in the DataStore is the routing, being able to find the "closest" key and it's reference, as well as the nth-closest should that fail. This is not a case where I am aware of much prior art (I've wondered a bit about spell checkers, if anybody knows anything else I would be very interested). A linear solution for this is very simple, go through the list and find the entry that is closest, second closest, etc. And linear should be fine as long as we only have 500 entries - but if you want a million entries, then I can tell you it will NOT be fine. And nobody has put forward a good solution for an O(<n) lookup of the closest key based only on the existance of a closeness relation. I did write an O(log n) datastore based on a binary tree for Serapis, but to make it work I made the assumption that there was an order on the keys equivalent to the closeness. For the current case of lexicographic comparisons that works fine : I ordered the tree lexicographically, and since walking through a tree "In order" is easy enough, all you have to do to find the closest key is start from where the key would be in the tree, and the closest is either the previous or next entry. However, not all keys can be guaranteed to have the property that they can be ordered so the closest keys end up next to them - examples of keys that can't, mentioned here only in the last few days, are circular lexicographic trees (F is closer to 0 than D - one could probably hack this into a tree, but it would be specific to this type) and the logicial search keys which have infinite dimensions. The only (possible) solution I have come up with to the problem is the sexy but probably impossibly impractical idea that the answer to the question: "What can find the closest key fast having only a closeness relation between them?" is, of course, Freenet - so the DataStore ADT should be modeled as a Freenet microcosmos within each node (with each node unit in the microcosmos having a datastore small enough for linear searches). If somebody has a better idea I would be VERY interested. -- \oskar _______________________________________________ Freenet-dev mailing list Freenet-dev at lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/freenet-dev
