According to J. op den Brouw:
> as the title says, I had problems with comments in HTML, and the file
> HTML.cc.
> What is the problem. Well, most web pages at are school are produces by
> people who don't know s**t about HTML.
> 
> They produce comments like:
> 
> <!hello -->
> <!------------hello------------>
..
> This is
> done by finding the -- just before the > (comments end with -->). But in
> the first comment
> case above it fails. Anyway, it messes my indexing. The trick is (I
> HOPE) that line 161:
> 
>                   q = (unsigned char*)strstr((char *)position, "--");
> 
> should be changed in:
> 
>                   q = (unsigned char*)strstr((char *)position, "-->");
> 
> It finds the first occurence of --> so don't recurse comments. Anyway,
> it works on my htdig system.

This isn't quite right.  We had a big discussion about this two weeks ago.
The HTML standard allows white space (even newlines) between the closing
"--" and ">" of a comment.  The trick is to gobble up any extra dashes
after the first two, and then skip white space.  If that doesn't leave
you at a ">", I think you have to start over again, scanning for the next
"--".

> Another problem is that M$ Frontpage 98 in combination with Frontpage
> Server Extension don't do
> <AREA> tags. They create a webbot (inside a comment). If the webbot has
> links, these links don't
> get indexed. Of couse this is a M$ / user problem, it just that you know
> of it.

Yes, M$ server extensions pose a problem (as does JavaScript).  If anyone
can enhance the HTML parser to deal with these webbot links reliably,
without breaking anything else, go for it.  Otherwise, it'll remain a
problem, until M$ learns to adhere to standards other than their own.  ;-)

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to