Re: [freenet-dev] freenet (pre-)searchengine

Tom Kaitchuck Fri, 15 Aug 2003 10:17:56 -0700

On Friday 15 August 2003 01:22 am, Gordan wrote:

> OK, let's say that the index would take up 100 MB. If you think that
> downloading a 100 MB HTML file (or XML, or CSV if they are separate files)
> into a browser using JavaScript will work, they you have some interesting
> misconceptions about what modern browsers can handle sensibly.
>
> 1) If you give IE6 or Mozilla (I'm guessing that you are aiming for DOM-ish
> browsers only) a 100 MB file to process with JavaScript, it is going to go
> away for a very long time.
>
> 2) If you make it in such a way that you have to download a 100 MB file to
> perform a query, then that's a non-starter anyway, as that can take hours,
> and has to deal with redundant FEC - again, it could be difficult.
>
> Therefore, you would need a way of segmenting the index so that you could
> search it sparsely, and only download a very small fraction of it, based on
> the search terms.



Here is how you can do it.

Have a bot that spiders Freenet and grabs the URI the Title, a one line 
description and the META keywords in the HTML. Create a SSK that has a list 
of all the keywords and a page for each of the keywords that had enough 
content to be included in the index. Each of the keyword indexes contain just 
a list of URIs, Titles, and descriptions. Each of these indexes is 
compressed. Update each index when it has enough new content to go up to the 
next size level (you want to avoid padding), or if it has not been updated in 
a long time. Clients fetch only the keywords they want, and they hold on to 
the index for say 1 month. If ever any of the indexes gets too big, label it 
a 'popular' index and then have it only link to index.htmls, and sites with 
very large numbers of links. 

Since this would have to be implemented in a client side app, you could add 
all sorts of features. Like allowing anyone to generate their own content 
specific index, have a sight black list, or show only DBRs or one shot 
sights. Then when the user finds what they want, the app requests it, and 
then opens their webbrouser to the right URI. Easy to rank too: % of keywords 
contained * % those keywords make up out of the total keywords.

This would scale pretty well, because it would only use a few hundred bytes 
(after compression) for each site. So, you could have thousands of separate 
sites for each category with no problem.
_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] freenet (pre-)searchengine

Reply via email to