In <[EMAIL PROTECTED]>, "Sean M. Burke" 
<[EMAIL PROTECTED]> writes:
> User-agent: *
>      Disallow: /cgi-bin/
>      Disallow: /~mojojojo/misc/
> 
> So I've changed it to this, and was about to submit it as a patch for the
> next LWP release:
>    /^\s*Disallow:\s*(.*)/i
>    # Silently forgive leading whitespace.
> 
> But first, I thought I'd ask the list here: does anyone thing this'd break
> anything? 

The change should not break anything, files using leading whitespace for 
comments or some other obscure purpose do not comply with the specification 
anyway and will see varying results.

However, since the standard is sufficiently clear on the correct format, I 
would rather opt to not support a non-standard format with leading whitespace 
since developers will start relying on this feature and will complain that 
other, standards compliant robots libraries don't support it (the infamous "my 
page works in Internet Explorer so I cannot be broken" attitude).

Rather than modifying the library I would suggest any application that wants to
handle this content error gracefully should strip leading whitespace prior to 
calling parse().

--
Klaus Johannes Rusch
[EMAIL PROTECTED]
http://www.atmedia.net/KlausRusch/

Reply via email to