According to [EMAIL PROTECTED]:
> I'm using htdig 3.2 on RedHat 7.1 and I have this problem. When I buil word
> database and try search my webpages, I get result web page with wrong
> non-english characters (for example: �����). When I look at HTML code, I
> find that these wrong characters  was wrote as "& character" (á
> í ...). Where is problem?

This is a known bug in htcommon/HtSGMLCodec.cc.  It sets up the same rules
for decoding and re-encoding SGML entities.  The problem is when you use
an 8-bit encoding other than ISO-8859-1 (Latin 1 - Western Europe), the
accented characters in the upper half get encoded into SGML entities for
the Latin 1 set.  The only fix right now is to hack HtSGMLCodec not to
do this.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to