Hi, I am trying to use the nutch fetcher for d/l EXE/ZIP files from web pages. i've removed the suffixes from the regex-urlfilter & automation-urlfilter(files identical):
regex-urlfilter.txt: -------------------------------------------------------------------------------------------------------- -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|jpeg|JPEG|bmp|BMP|iso|ISO|bin|BIN)$ # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] # skip URLs with slash-delimited segment that repeats 3+ times, to break loops -.*(/.+?)/.*?\1/.*?\1/ # accept anything else +. ------------------------------------------------------------------------------------------------------------------ When trying to download EXE: http://www.xtodvd.com/apodvdcopy.exe the fetch fails: found segment crawl/segments/20070902084928 Fetching now the urls.. Fetcher: starting Fetcher: segment: crawl/segments/20070902084928 Fetcher: threads: 1000 fetching http://www.xtodvd.com/apodvdcopy.exe Error parsing: http://createdvd.net/apodvdcopy.exe: failed(2,200): org.apache.nutch.parse.ParseException: parser not found for contentType=application/x-dosexec url=http://createdvd.net/apodvdcopy.exe Fetcher: done when trying to fetch Zip file, its works, but how can i tell him to save the zip to a folder in a directory on the file system, do i need to write a plugin? thanks! Eyal Edri