robots.txt redirect (NUTCH-124)

Mathijs Homminga Sat, 21 Mar 2009 03:18:50 -0700

Hi everybody,

Can someone shine a light on NUTCH-124:

RobotRulesParser.java doesn't follow redirects when requesting therobots.txt file. Doug patched this, but that didn't make it to thetrunk.

What is the wished behavior here?



For example, when requesting the following url:
http://7is7.com/software/stateye/download/stateye097f.html

... RobotRulesParser requests the following robots.txt:
http://7is7.com/robots.txt

... however, that file doesn't exist, it redirects to:
http://www.7is7.com/robots.txt

... that robots.txt tells us the initial url is disallowed.

But does it really? Or is robots.txt file only applicable to http://www.7is7.comand not http://7is7.com.


So the question is: should we follow such redirects?

Thanks,
Mathijs

robots.txt redirect (NUTCH-124)

Reply via email to