Somebody on FMS has built an efficient search index. This compares to the many 
gigabytes of the Library/Spider combination. On the other hand his structure 
doesn't scale, and will eventually run into collisions. Thoughts?

Compression helps with Spider/Library, but evidently not enough - it's a lot of 
keys and takes a long time to insert. Just how much leverage do you get from an 
efficient binary data structure?



From FMS, thread "plugin should update automatically - how to?":
jeriadoc@swMR4jUIg8DQaxddEy...

[ discussion on auto-updating ]

I created a selfmade datastructure to have an index with low need of space,
actually 84MB for the whole index. I insert the index 4 times to have a good
reliabilty and response time. The insert of the whole index (4 times) needs
about 5 days.
...
How many freesites does that include? How have you shrunk it that far? I'm
pretty sure spider indexes are gigabytes, maybe even after compression...
...
1600 sites (USK only)
25.000 text-based files (pages)
5.000.000 different keywords (every combination of signs with more than 2 signs)
Every keyword is hashed as a 4-byte-value
The whole index contains 23.000.000 matches of a keyword with a page (needs
2x4=8 Bytes) and additional informations about the sites and the pages
= 2800 chunks a 32KB
= 88 MB compressed data (single insertion, compression rate 2.0)

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to