According to Joachim Seibert:
> While indexing a site with htdig, I regonized the following problem:
> 
> htdig seems to stop indexing a page, when a string terminator 0 (hex 00)
> appears within the site. Till this char, it indexes the page properly.
> 
> Have someone an idea, how to solve this problem?

I don't suppose editing the files to replace or get rid of the null
characters is an option?

The only quick fix I can think of would be to add the following close
to the start of HTML::parse() in htdig/HTML.cc, just after it checked
to make sure contents isn't 0 (i.e. line 148 in htdig 3.1.6):

    contents->replace('\0', ' ');

This might slow down the parser a bit because it has to do an extra
pass through the data to look for the nulls.  You should also make the
same change in htdig/Plaintext.cc (line 42 in 3.1.6) to handle nulls
in text/plain files.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to