On Mon, 2003-02-24 at 09:53, Marion Bates wrote: > Hi David, > > That's way cool, thanks for posting the url. It's quite speedy. I can't think of > any suggestions for improvement, except possibly adding a "how I did this" page. > :)
The 'How I Did This' is so trivial that perhaps it's better being shared here. In ht://dig, the only hack required was disabling the URL handling code that censors out double slashes '//' in Freenet URIs. With this done, the spider is now capable of fetching text and html content via fproxy. With that in place, I set the spider 'htdig' loose, with the Total Freedom Engine (TFE) as the starting URL. Via fproxy, htdig recursively pulls all the text and html documents it can find. As for the search interface on the web server side, I've added 2 fields: fproxy host and port. This is for people (like myself) who have a node running 24/7 on a server box, but turn off their workstation box when not in use. If you enter anything in these fields other than the default 127.0.0.1/8888, a PHP script edits the URLs in the search results to point to your LAN's internal address for your fproxy gateway (useful also for those of you at work who VPN to your home boxen and visit fproxy for your daily hit of anime pr0n). Result is that you can then just click on a url and have it come up. (Please take note - website/freesite authors - hard-coding 'http;//127.0.0.1:8888' into URLs on freesites or mainstream web pages is very unacceptable practice - the correct way to link to someone else's freesite is with href set to '/[EMAIL PROTECTED]'. The first index was built with a server timeout of 30 seconds. While that's generous for mainstream web servers, it's way short for fproxy, given freenet lag. So a spider is presently running with a new timeout of 180secs, which should glean some more results. The very act of spidering these pages will change their routing within Freenet - and will have a tendency to bring to life a lot of the more rarely-visited freesites. Most of what comes up in searches should actually be reachable through Freenet, as a result of the pages being requested and indexed. After the spider's present run, I'll set up the scripts to automatically zip up the search database and insert the database itself into Freenet, so that others can feed it into their own htdig program. Ultimately, I'd like to build up a totally in-freenet search system, whereby a GUI app (for *nix and Windoze) is distributed with it. Aiming for the app to support other people's search portals as well. Cheers David > > Thanks also, by the way, for your earlier compliments regarding the website. > Much appreciated by at least _one_ of the web editors. :) > > > Regards, > > -- MB _______________________________________________ chat mailing list [EMAIL PROTECTED] http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/chat