I'm finding that several things are wierd in the htdig/HTML.cc class

1) If a page has an ill-formed comment tag like this:
<!-- hennerik CVSweb $Revision: 1.64  0->

Everything after the start of the comment is eaten.. the entire page.. since the comment end is bad. A browser handles this fine.

2) In the HTML::parse function

223     unsigned char       *text = (unsigned char *)new char[contents->length()+1];


This variable seems intended to store the document contents. However both times it's used as a RHS of an anssignment statment:


    224     unsigned char       *ptext = text;
    [snip]
    380       position = text;
    381       start = position;
    382
    383       while (*position)
    384       {

Note that the while statement (lines 384 to 545) is likely never entered since gcc seems to initialize text to zeros on Linux. The behavior could be platform dependent since who knows what's in that memory.

Any feedback?

Thanks

Neal Richter Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485





------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to