Somebody on FMS has built an efficient search index. This compares to the many gigabytes of the Library/Spider combination. On the other hand his structure doesn't scale, and will eventually run into collisions. Thoughts?
Compression helps with Spider/Library, but evidently not enough - it's a lot of keys and takes a long time to insert. Just how much leverage do you get from an efficient binary data structure? From FMS, thread "plugin should update automatically - how to?": jeriadoc@swMR4jUIg8DQaxddEy... [ discussion on auto-updating ] I created a selfmade datastructure to have an index with low need of space, actually 84MB for the whole index. I insert the index 4 times to have a good reliabilty and response time. The insert of the whole index (4 times) needs about 5 days. ... How many freesites does that include? How have you shrunk it that far? I'm pretty sure spider indexes are gigabytes, maybe even after compression... ... 1600 sites (USK only) 25.000 text-based files (pages) 5.000.000 different keywords (every combination of signs with more than 2 signs) Every keyword is hashed as a 4-byte-value The whole index contains 23.000.000 matches of a keyword with a page (needs 2x4=8 Bytes) and additional informations about the sites and the pages = 2800 chunks a 32KB = 88 MB compressed data (single insertion, compression rate 2.0)
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Devl mailing list [email protected] http://freenetproject.org/cgi-bin/mailman/listinfo/devl
