RE: Can't crawl a domain; can't figure out why.

2011-12-20 Thread Chip Calhoun
this wouldn't work. Chip -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, December 19, 2011 5:01 PM To: user@nutch.apache.org Subject: Re: Can't crawl a domain; can't figure out why. Nothing peculiar, looks like Nutch 1.4 right? But you also didn't

Re: Can't crawl a domain; can't figure out why.

2011-12-20 Thread alxsss
, and they don't see a reason why this wouldn't work. Chip -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, December 19, 2011 5:01 PM To: user@nutch.apache.org Subject: Re: Can't crawl a domain; can't figure out why. Nothing peculiar, looks like Nutch

RE: Can't crawl a domain; can't figure out why.

2011-12-20 Thread Chip Calhoun
, December 20, 2011 2:15 PM To: user@nutch.apache.org Subject: Re: Can't crawl a domain; can't figure out why. It seems that robots.txt in libraries.mit.edu has a lot of restrictions. Alex. -Original Message- From: Chip Calhoun ccalh...@aip.org To: user user@nutch.apache.org

Re: Can't crawl a domain; can't figure out why.

2011-12-20 Thread Markus Jelsma
- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Tuesday, December 20, 2011 2:15 PM To: user@nutch.apache.org Subject: Re: Can't crawl a domain; can't figure out why. It seems that robots.txt in libraries.mit.edu has a lot of restrictions. Alex. -Original Message

Can't crawl a domain; can't figure out why.

2011-12-19 Thread Chip Calhoun
I'm trying to crawl pages from a number of domains, and one of these domains has been giving me trouble. The really irritating thing is that it did work at least once, which led me to believe that I'd solved the problem. I can't think of anything at this point but to paste my log of a failed