I just realized that if I exclude html files for a job, links in these files will not be followed. Is this a desirable behaviour? Should links be followed regardless of the exclude filter?

I discovered this issue when I was going to crawl only pdfs and realized that the job ended without finding any documents at all. I think I had something like this in my include list:
http://foreninger.uio.no/.*\.pdf$
http://folk.uio.no/.*\.pdf$

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Reply via email to