this wouldn't work.
Chip
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Monday, December 19, 2011 5:01 PM
To: user@nutch.apache.org
Subject: Re: Can't crawl a domain; can't figure out why.
Nothing peculiar, looks like Nutch 1.4 right? But you also didn't
, and they don't see a reason why this wouldn't work.
Chip
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Monday, December 19, 2011 5:01 PM
To: user@nutch.apache.org
Subject: Re: Can't crawl a domain; can't figure out why.
Nothing peculiar, looks like Nutch
, December 20, 2011 2:15 PM
To: user@nutch.apache.org
Subject: Re: Can't crawl a domain; can't figure out why.
It seems that robots.txt in
libraries.mit.edu
has a lot of restrictions.
Alex.
-Original Message-
From: Chip Calhoun ccalh...@aip.org
To: user user@nutch.apache.org
-
From: alx...@aim.com [mailto:alx...@aim.com]
Sent: Tuesday, December 20, 2011 2:15 PM
To: user@nutch.apache.org
Subject: Re: Can't crawl a domain; can't figure out why.
It seems that robots.txt in
libraries.mit.edu
has a lot of restrictions.
Alex.
-Original Message
I'm trying to crawl pages from a number of domains, and one of these domains
has been giving me trouble. The really irritating thing is that it did work at
least once, which led me to believe that I'd solved the problem. I can't think
of anything at this point but to paste my log of a failed
5 matches
Mail list logo