On Friday 15 August 2003 01:22 am, Gordan wrote: > OK, let's say that the index would take up 100 MB. If you think that > downloading a 100 MB HTML file (or XML, or CSV if they are separate files) > into a browser using JavaScript will work, they you have some interesting > misconceptions about what modern browsers can handle sensibly. > > 1) If you give IE6 or Mozilla (I'm guessing that you are aiming for DOM-ish > browsers only) a 100 MB file to process with JavaScript, it is going to go > away for a very long time. > > 2) If you make it in such a way that you have to download a 100 MB file to > perform a query, then that's a non-starter anyway, as that can take hours, > and has to deal with redundant FEC - again, it could be difficult. > > Therefore, you would need a way of segmenting the index so that you could > search it sparsely, and only download a very small fraction of it, based on > the search terms.
Here is how you can do it. Have a bot that spiders Freenet and grabs the URI the Title, a one line description and the META keywords in the HTML. Create a SSK that has a list of all the keywords and a page for each of the keywords that had enough content to be included in the index. Each of the keyword indexes contain just a list of URIs, Titles, and descriptions. Each of these indexes is compressed. Update each index when it has enough new content to go up to the next size level (you want to avoid padding), or if it has not been updated in a long time. Clients fetch only the keywords they want, and they hold on to the index for say 1 month. If ever any of the indexes gets too big, label it a 'popular' index and then have it only link to index.htmls, and sites with very large numbers of links. Since this would have to be implemented in a client side app, you could add all sorts of features. Like allowing anyone to generate their own content specific index, have a sight black list, or show only DBRs or one shot sights. Then when the user finds what they want, the app requests it, and then opens their webbrouser to the right URI. Easy to rank too: % of keywords contained * % those keywords make up out of the total keywords. This would scale pretty well, because it would only use a few hundred bytes (after compression) for each site. So, you could have thousands of separate sites for each category with no problem. _______________________________________________ devl mailing list [EMAIL PROTECTED] http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl
