B''H On Tue, 16 Jun 2009 21:59:13 +0930 Karl Goetz <[email protected]> wrote: > On Tue, 16 Jun 2009 11:51:10 +0200 > Let us know how you progress. > kk
Well currently I have a sort of working spider, just that it is a bit too good at crawling. For example it's hard too know what parts of the page is really wiki and whats not, example: http://wiki.gnewsense.org/ForumMain/ So I will have to use some sort of black list of pages not to fetch (currently it just checks that it is a page on wiki.gnewsense.org). img download and rewrite also "wiki-links" rewrite is also implemented currently. So I mostly need a list of "bad"-places, like the forum and http://wiki.gnewsense.org/Site/FASTMembership -- Patrik Lembke www: http://blambi.chebab.com/ jabber: [email protected] GnuPG-key: http://gpg.chebab.com/8FA11A15.asc
signature.asc
Description: PGP signature
_______________________________________________ gNewSense-dev mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/gnewsense-dev
