According to Jure Pecar:
> I browsed through all the faq, mailing lists and google and didn't found
> any usefull information on the subject.
> 
> I had a htdig 3.1.5 on redhat 6.2 configured with locale: sl_SI to index
> pages in slovenian and it worked ok. Even if i entered some caron
> characters into the search query, i got apropriate result back.
> 
> Now i upgraded to redhat 7.2 and htdig 3.2.0b4 that comes with it (from
> updates) and have the following situation: search queries with ccaron for
> example works ok, but on the resulting page there's the egrave character
> instead of the ccaron. I have a theory of what's going on: when htdig
> indexes the pages, it interprets characters as iso-8859-1 (altough charset
> is set to iso-8859-2 in the html header) and sees ccaron as egrave; so
> thats what it shows in the output. I even forced the apache to output all
> the files with iso-8859-2 header, but when running htdig -vvvvi i still
> see some 8859-1 popping up.
> 
> I want to know if this is some temporary misfeature in 3.2.0 beta version.
> What all get changed between 3.1.5 and 3.2.0b4 that could affect this
> behaviour? It's pretty bad that i cant get 3.1.5 to work on rehdat 7.2 at
> all: htdig segfaults; strace shows that it dies while processing LC_*
> files. 

It's been discussed a couple times before on the mailing list, but without
working search capabilities on the list archives, it may be hard to find.

The problem is that htdig 3.2 converts all characters between 160 and
255 back to SGML entities for ISO-8859-1 characters, which is obviously
wrong when your documents are encoded in a different character set.
The fix will be to add a translate_latin1 attribute to disable these
translations in the htcommon/HtSGMLCodec.cc constructor.  For now, the
only quick fix is to modify this constructor not to do these translations.
The HtSGMLCodec class is new to 3.2, which is why 3.1.5 doesn't have
this problem.  3.1.5 (and the 3.1.6 release that's in development)
doesn't translate the accented characters back to SGML entities.

I find it very surprising that you can't get 3.1.5 to work on Red Hat 7.2.
Please try the 3.1.6 snapshot in http://www.htdig.org/files/snapshots/,
and let us know if that fails too.  If it fails, please give us any
relevant error messages, and a stack backtrace from gdb from the core
dump.  (See http://www.htdig.org/FAQ.html#q5.14)

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to