The default url filter. # Better for whole-internet crawling. # Each non-comment, non-blank line contains a regular expression # prefixed by '+' or '-'. The first matching pattern in the file # determines whether a URL is included or ignored. If no pattern # matches, the URL is ignored.
# skip file: ftp: and mailto: urls -^(file|ftp|mailto): # skip image and other suffixes we can't yet parse .\(ico|ICO|css|sit|eps|wmf|zip|mpg|gz|rpm|tgz|mov|MOV|exe|gif|GIF|JPEG|jpeg|jpg|JPG)$ +\.(pdf|rtf|xls|doc|txt|htm|html)$ # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] # accept anything else +^http://([a-z0-9]*\.)*linux62.org/ -. It finds regex-urlfilter.txt when it set up the crawl. I you have any ideas... Regard Marc -----Message d'origine----- De�: Matthias Jaekle [mailto:[EMAIL PROTECTED] Envoy�: mercredi 11 mai 2005 10:27 ��: [email protected] Objet�: Re: url filters Hi, have you choosen the regex-urlfilter in the conf-file? Could you please post your whole regex-file. Matthias -- http://www.eventax.com - eventax GmbH http://www.umkreisfinder.de - Die Suchmaschine f�r Lokales und Events
