Hi,
I think there is a bug in htdig (3.0.8b2, Solaris 2.6). When it parses a
document containing wrong HTML - in my example an unclosed comment - it
stores the  beginning content from the document parsed before. Of course,
wrong HTML is a bad thing, but I think it should store no content (or a
warning) instead of other content for this wrong page.

Example:   
start_url:   http://www.tu-chemnitz.de/~fri/htdigtest/t1.html

It contains a link to .../t2.html with wrong HTML.
The resulting db.docs is:
0   u:http://www.tu-chemnitz.de/~fri/htdigtest/t1.html  t:Title 1   a:0
m:902509994 s:130   h:  HEAD 1 Link to t2 some text     l:902509999 L:1
I:130   d:  A:
1   u:http://www.tu-chemnitz.de/~fri/htdigtest/t2.html  t:Title 2   a:0
m:902509903 s:183   h:  HEAD 1 Link to t2 some text     l:902509999 L:0
I:183   d:Link to t2    A: ^^^^^^^^^^^^^^^^^^^^^^^ that's wrong!

You see in the second entry for t2.html the content of t1.html.
Does anyone has a fix or a suggestion where to look in the code?

Thanks,
        - Frank
-- 
Email: [EMAIL PROTECTED]  http://www.tu-chemnitz.de/~fri/
Work:  Computing Services, Technical University, 09107 Chemnitz, Germany

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to