Hi, I want to index PDF-Files with German Umlaute (�, �, �, �). Some tests had shown me that htdig (v. 3.1.5) and xpdf (v. 0.91) are working pretty good with German Umlaute, but the external parser parse_doc.pl has problems with them. It splits words with Umlaute in two words without the Umlaut. For example: w beim 41 0 w diesj 45 0 w hrigen 50 0 w den 58 0 w Platz 62 0 In this case the German word "diesj�hrigen" is split in "diesj" and "hrigen" and I can find both with htsearch. Does anyone know how to solve this problem for example with a modified version of parse_doc.pl? Thanks, Christian Huhn ------------------------------------ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
