B''H

On Tue, 16 Jun 2009 21:59:13 +0930 Karl Goetz <[email protected]> wrote:
> On Tue, 16 Jun 2009 11:51:10 +0200
> Let us know how you progress.
> kk

Well currently I have a sort of working spider, just that it is a bit
too good at crawling. For example it's hard too know what parts of the
page is really wiki and whats not, example:
http://wiki.gnewsense.org/ForumMain/

So I will have to use some sort of black list of pages not to fetch
(currently it just checks that it is a page on wiki.gnewsense.org).

img download and rewrite also "wiki-links" rewrite is also implemented
currently.

So I mostly need a list of "bad"-places, like the forum and
http://wiki.gnewsense.org/Site/FASTMembership

-- 
Patrik Lembke
www: http://blambi.chebab.com/
jabber: [email protected]
GnuPG-key: http://gpg.chebab.com/8FA11A15.asc

Attachment: signature.asc
Description: PGP signature

_______________________________________________
gNewSense-dev mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/gnewsense-dev

Reply via email to