Hello,
I am trying use Nutch to look up at specify URLs like this are: http://server.domain/appname/get?id=34&view=content http://server.domain/appname/get?id=35&view=content http://server.domain/appname/get?id=36&view=content (So I use "-depth 1" option for running crawl.) Some sites are HTML pages, but some return files. After running the crawl I found at the log file this record: 060719 160036 fetch okay, but can't parse http://server.domain/appname/get?id=36&view=content, reason: failed(2,203): Content-Type not text/html: application/msword It looks like crawl infers Content-Type from URL and than compares it with received Content-Type. Is possible to use received Content-Type without checking Content-Type implied from URL? How can I resolve this problem? Tank you for reply. Milan Skuhra