Re: [htdig] German Umlaut and PDF problem

Gilles Detillieux Mon, 12 Mar 2001 15:20:18 -0800
Sorry for the long delay in replying.

According to Dirk Datzert:
> has anybody a working german setup for the following environment:
> 
> SuSE 7.0, ht:Dig 3.1.5, ispell-3.1.20, ingerman-1.4. acroread 4 or xpdf
> 0.90
> 
> I read the FAQ of ht:Dig but I didn't understand whats wrong with my
> configuration.
> The german.0, root2word.db and word2root.db files are missing. How can I
> built these files ?

The german.0 file must be built from from individual ispell dictionary
files that you select from the German ispell dictionary package.  You build
it with a command line something like: " cat * | sort | uniq >lang.0 ".

There may also be a pre-made german.0 file in the "Contributed Works"
section of the htdig.org web site.

The root2word.db and word2root.db files are build from your german.0 and
german.aff files, using the command "htfuzzy endings".

> -- START OF CONF ---------------------------------
> 
> database_dir:         /opt/www/htdig/db/apdoku
> 
> start_url:            http://web01/mirrors/N4856/
> 
> limit_urls_to:                ${start_url}
> 
> exclude_urls:         /cgi-bin/ /cgi-test/ .cgi .pl
> 
> bad_extensions:               .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \
>               .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi
> 
> maintainer:           [EMAIL PROTECTED]
> 
> max_head_length:      10000
> 
> max_doc_size:         2000000
> 
> no_excerpt_show_top:  true
> 
> search_algorithm:     exact:1 synonyms:0.5 endings:0.1
> 
> locale:       de_DE
> 
> lang_dir:             ${common_dir}
> 
> bad_word_list:        ${lang_dir}/bad_words
> endings_affix_file:   ${lang_dir}/german.aff
> endings_dictionary:   ${lang_dir}/german.0
> endings_root2word_db: ${lang_dir}/root2word.db
> endings_word2root_db: ${lang_dir}/word2root.db
> 
> external_parsers: application/msword /usr/bin/parse_doc.pl \
>                   application/postscript /usr/bin/parse_doc.pl \
>                   application/pdf /usr/bin/parse_doc.pl
> 
> -- END OF CONF ---------------------------------

Get rid of the junky old parse_doc.pl, and upgrade to doc2html, also in
the Contributed Works on htdig.org.  You'll need htdig 3.1.5 to support
external converters, but you'll get much more reliable and consistent
parsing with an external converter than the kludgy external parser you're
using now.  If umlauts are being parsed correctly in HTML files, but not
in PDFs, the fault lies in the external parser.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
Re: [htdig] German Umlaut and PDF problem

Reply via email to