I'm finding that several things are wierd in the htdig/HTML.cc class
1) If a page has an ill-formed comment tag like this: <!-- hennerik CVSweb $Revision: 1.64 0->
Everything after the start of the comment is eaten.. the entire page.. since the comment end is bad. A browser handles this fine.
2) In the HTML::parse function
223 unsigned char *text = (unsigned char *)new char[contents->length()+1];
This variable seems intended to store the document contents. However both times it's used as a RHS of an anssignment statment:
224 unsigned char *ptext = text;
[snip]
380 position = text;
381 start = position;
382
383 while (*position)
384 {Note that the while statement (lines 384 to 545) is likely never entered since gcc seems to initialize text to zeros on Linux. The behavior could be platform dependent since who knows what's in that memory.
Any feedback?
Thanks
Neal Richter Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
