Hi everybody,

Can someone shine a light on NUTCH-124:
RobotRulesParser.java doesn't follow redirects when requesting the robots.txt file. Doug patched this, but that didn't make it to the trunk.
What is the wished behavior here?


For example, when requesting the following url:
http://7is7.com/software/stateye/download/stateye097f.html

... RobotRulesParser requests the following robots.txt:
http://7is7.com/robots.txt

... however, that file doesn't exist, it redirects to:
http://www.7is7.com/robots.txt

... that robots.txt tells us the initial url is disallowed.
But does it really? Or is robots.txt file only applicable to http://www.7is7.com and not http://7is7.com.

So the question is: should we follow such redirects?

Thanks,
Mathijs

Reply via email to