Re: [freenet-chat] New Freenet Search Engine

David McNab Sun, 23 Feb 2003 13:40:42 -0800

On Mon, 2003-02-24 at 09:53, Marion Bates wrote:
> Hi David,
> 
> That's way cool, thanks for posting the url. It's quite speedy. I can't think of
> any suggestions for improvement, except possibly adding a "how I did this" page. 
> :)


The 'How I Did This' is so trivial that perhaps it's better being shared
here.

In ht://dig, the only hack required was disabling the URL handling code
that censors out double slashes '//' in Freenet URIs. With this done,
the spider is now capable of fetching text and html content via fproxy.

With that in place, I set the spider 'htdig' loose, with the Total
Freedom Engine (TFE) as the starting URL. Via fproxy, htdig recursively
pulls all the text and html documents it can find.

As for the search interface on the web server side, I've added 2 fields:
fproxy host and port. This is for people (like myself) who have a node
running 24/7 on a server box, but turn off their workstation box when
not in use. If you enter anything in these fields other than the default
127.0.0.1/8888, a PHP script edits the URLs in the search results to
point to your LAN's internal address for your fproxy gateway (useful
also for those of you at work who VPN to your home boxen and visit
fproxy for your daily hit of anime pr0n).

Result is that you can then just click on a url and have it come up.
(Please take note - website/freesite authors - hard-coding
'http;//127.0.0.1:8888' into URLs on freesites or mainstream web pages
is very unacceptable practice - the correct way to link to someone
else's freesite is with href set to '/[EMAIL PROTECTED]'.

The first index was built with a server timeout of 30 seconds. While
that's generous for mainstream web servers, it's way short for fproxy,
given freenet lag. So a spider is presently running with a new timeout
of 180secs, which should glean some more results.

The very act of spidering these pages will change their routing within
Freenet - and will have a tendency to bring to life a lot of the more
rarely-visited freesites. Most of what comes up in searches should
actually be reachable through Freenet, as a result of the pages being
requested and indexed.

After the spider's present run, I'll set up the scripts to automatically
zip up the search database and insert the database itself into Freenet,
so that others can feed it into their own htdig program. Ultimately, I'd
like to build up a totally in-freenet search system, whereby a GUI app
(for *nix and Windoze) is distributed with it. Aiming for the app to
support other people's search portals as well.

Cheers
David

> 
> Thanks also, by the way, for your earlier compliments regarding the website.
> Much appreciated by at least _one_ of the web editors.  :)  
> 
> 
> Regards,
> 
> -- MB


_______________________________________________
chat mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/chat

Re: [freenet-chat] New Freenet Search Engine

Reply via email to