According to Joachim Seibert:
> While indexing a site with htdig, I regonized the following problem:
>
> htdig seems to stop indexing a page, when a string terminator 0 (hex 00)
> appears within the site. Till this char, it indexes the page properly.
>
> Have someone an idea, how to solve this problem?
I don't suppose editing the files to replace or get rid of the null
characters is an option?
The only quick fix I can think of would be to add the following close
to the start of HTML::parse() in htdig/HTML.cc, just after it checked
to make sure contents isn't 0 (i.e. line 148 in htdig 3.1.6):
contents->replace('\0', ' ');
This might slow down the parser a bit because it has to do an extra
pass through the data to look for the nulls. You should also make the
same change in htdig/Plaintext.cc (line 42 in 3.1.6) to handle nulls
in text/plain files.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html