According to Matt Edwards:
> HtDig 3.1.1 isn't parsing (slightly non-standard) comments correctly.
> 
> Extra dashes in the comment can confuse the current parser into
> ignoring a lot of content.  For example <!--comment----> is seen as 
> an uncompleted comment beginning.
> 
> It seems a lot of web content doesn't strictly adhere to the 
> "standard" for comments, so we should be a little careful here.
> 
> For example both IE and Netscape require "<!--" comments to end 
> with a "-->" without whitespace between the "--" and the ">".
> Perhaps htDig would be better off doing the same.  
[snip]
> According to Marjolein Katsma: 
> > Starting on my next project, I had to dig in HTML.cc, and found th 
> > efollowing code to filter out comments: 
> 
> According to Gilles Detilleux

Actually, this was more text quoted from Marjolein...
> > While this will catch *most* comments, it will see some perfectly legal 
> > comments as illegal and skip the rest of the page. The best definition
> > of comments is found in HTML 2.0 (unchanged in the actual DTD in later 
> > versions, but never properly explained any more...): 
> > 
> > "To include comments in an HTML document, use a comment declaration. A 
> > comment declaration consists of `<!' followed by zero or more comments 
> > followed by `>'. Each comment starts with `--' and includes all text up
> > to and including the next occurrence of `--'. In a comment declaration,
> > white space is allowed after each comment, but not before the first 
> > comment.  The entire comment declaration is ignored." 
> > 

Marjolein brought up this issue in January.  The htdig code used
to do what you're requesting, but she wanted it changed to adhere to
the standard.  I only helped her debug her code so it would do what she
wanted it to, to allow (require) standard comments.  She went on to give
a few examples of what standard comments could be:

> Thus, the following are legal comment declarations:
> 
> <!--first comment
> on two lines --
> 
> --second comment--
> --third comment--
> >
> 
> <!>

At the time, i.e. in 3.1.0b4, htdig didn't handle these, and your code
snippet won't either.  I'm assuming she had a reason to want this change.
My feeling is htdig should respect the standard, and any non-standard
behaviour should be optional.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to