[
https://issues.apache.org/jira/browse/LABS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638882#action_12638882
]
Thorsten Scherler commented on LABS-198:
----------------------------------------
http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1
"...There can only be a single "/robots.txt" on a site..."
http://www.robotstxt.org/norobots-rfc.txt (sec 3.1)
"...under a standard relative path on the server: "/robots.txt"."
> It should be "new URL(base, "/robots.txt");"
In your patch however you do "new URL(getUrlPrefix(base) + "/robots.txt");".
The first suggestion is correct the patch is not.
In URL java doc for public "URL(URL context, String spec)" you find:
* If the spec's path component begins with a slash character
* "/" then the
* path is treated as absolute and the spec path replaces the context path.
Meaning there is no need to use getUrlPrefix(base) , further this method
returns a String. There is however no URL(String,String).
I will apply the correct version now.
Thanks Javier for spotting this and providing a patch.
> NoRobotsClient don't follow the standar
> ---------------------------------------
>
> Key: LABS-198
> URL: https://issues.apache.org/jira/browse/LABS-198
> Project: Labs
> Issue Type: Bug
> Components: Droids
> Reporter: Javier Puerto
> Attachments: norobot-rfc.diff
>
>
> I see that the url for the robots was relative to the path and this not
> follow the robots standard.
> ...
> public static URL findRobotsUrl(URL base, String prefix) throws
> MalformedURLException {
> URL url = new URL(base, "robots.txt");
> boolean exist = existUrl(url);
> ...
> It should be "new URL(base, "/robots.txt");"
> I found this on the web:
> * http://www.robotstxt.org/norobots-rfc.txt (sec 3.1)
> * http://en.wikipedia.org/wiki/Robots.txt
> * http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1
> Attach a patch to solve this behavior.
> Salu10.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]