According to Jure Pecar: > I browsed through all the faq, mailing lists and google and didn't found > any usefull information on the subject. > > I had a htdig 3.1.5 on redhat 6.2 configured with locale: sl_SI to index > pages in slovenian and it worked ok. Even if i entered some caron > characters into the search query, i got apropriate result back. > > Now i upgraded to redhat 7.2 and htdig 3.2.0b4 that comes with it (from > updates) and have the following situation: search queries with ccaron for > example works ok, but on the resulting page there's the egrave character > instead of the ccaron. I have a theory of what's going on: when htdig > indexes the pages, it interprets characters as iso-8859-1 (altough charset > is set to iso-8859-2 in the html header) and sees ccaron as egrave; so > thats what it shows in the output. I even forced the apache to output all > the files with iso-8859-2 header, but when running htdig -vvvvi i still > see some 8859-1 popping up. > > I want to know if this is some temporary misfeature in 3.2.0 beta version. > What all get changed between 3.1.5 and 3.2.0b4 that could affect this > behaviour? It's pretty bad that i cant get 3.1.5 to work on rehdat 7.2 at > all: htdig segfaults; strace shows that it dies while processing LC_* > files.
It's been discussed a couple times before on the mailing list, but without working search capabilities on the list archives, it may be hard to find. The problem is that htdig 3.2 converts all characters between 160 and 255 back to SGML entities for ISO-8859-1 characters, which is obviously wrong when your documents are encoded in a different character set. The fix will be to add a translate_latin1 attribute to disable these translations in the htcommon/HtSGMLCodec.cc constructor. For now, the only quick fix is to modify this constructor not to do these translations. The HtSGMLCodec class is new to 3.2, which is why 3.1.5 doesn't have this problem. 3.1.5 (and the 3.1.6 release that's in development) doesn't translate the accented characters back to SGML entities. I find it very surprising that you can't get 3.1.5 to work on Red Hat 7.2. Please try the 3.1.6 snapshot in http://www.htdig.org/files/snapshots/, and let us know if that fails too. If it fails, please give us any relevant error messages, and a stack backtrace from gdb from the core dump. (See http://www.htdig.org/FAQ.html#q5.14) -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

