Doug Kaufman <[EMAIL PROTECTED]> writes:

> On Thu, 18 Sep 2003, Hrvoje Niksic wrote:
>
>> modifying advance_declaration() in html-parse.c.  A future version of
>> Wget will probably parse comments in a non-compliant fashion, by
>> considering everything between <!-- and --> to be a comment, which is
>> what most other browsers have been doing since the beginnings of the
>> web.
>
> The lynx browser is configurable as to how it parses comments.

So is Wget, as of last night.  The default is minimal (non-compliant)
comment parsing, and that can be changed with `--strict-comments'.

> It can change on the fly from "minimal comments" to "historical
> comments" to "valid comments". Which browsers act in non-compliant
> fashion all the time?

Those that display http://www.hro.org/docs/rlex/uk/index.htm (unless
I'm mistaken), and that would mean pretty much all of them.  Of
course, that page is but one example out of many.

Some browsers have more complex heuristics for comment parsing, but
adding that to Wget would probably be overdoing it.

Reply via email to