Hi Marc I suppose you should focus on crawl-urlfilter.txt but not regex-urlfilter.txt when you run "crawl" command
Regards /Jack On 5/11/05, Marc DELERUE <[EMAIL PROTECTED]> wrote: > The default url filter. > # Better for whole-internet crawling. > > # Each non-comment, non-blank line contains a regular expression > # prefixed by '+' or '-'. The first matching pattern in the file > # determines whether a URL is included or ignored. If no pattern > # matches, the URL is ignored. > > # skip file: ftp: and mailto: urls > -^(file|ftp|mailto): > > # skip image and other suffixes we can't yet parse > .\(ico|ICO|css|sit|eps|wmf|zip|mpg|gz|rpm|tgz|mov|MOV|exe|gif|GIF|JPEG|jpeg|jpg|JPG)$ > +\.(pdf|rtf|xls|doc|txt|htm|html)$ > # skip URLs containing certain characters as probable queries, etc. > [EMAIL PROTECTED] > > # accept anything else > +^http://([a-z0-9]*\.)*linux62.org/ > -. > > It finds regex-urlfilter.txt when it set up the crawl. > > I you have any ideas... > > Regard > > Marc > -----Message d'origine----- > De: Matthias Jaekle [mailto:[EMAIL PROTECTED] > Envoy�: mercredi 11 mai 2005 10:27 > �: [email protected] > Objet: Re: url filters > > Hi, > have you choosen the regex-urlfilter in the conf-file? > Could you please post your whole regex-file. > Matthias > -- > http://www.eventax.com - eventax GmbH > http://www.umkreisfinder.de - Die Suchmaschine f�r Lokales und Events >
