Hi,
I've installed htDig on a Red Hat 8.0 box and have some problems with ISO to Unicode (UTF-8) conversions.
The website to dig is ISO8859 based as are the documents referred to (pdf, doc, xls and ppt).
The parsing and searchengine works fine except for special chars.
This is due to a Unicode conversion done by my Linux box.
In fact, for plain html and text-files we can avoid the conversion when we turn Unicode conversion off on the Linux box (unicode_stop command). But I can't find a solution for the doc2html (pdf2text) or catdoc parsers.
Does anybody have a hint, clue or solution ?
Tx