According to Benjelloun Adnane:
> to make doc_parser.pl to work with accents please change this line :
> 
> push @allwords, grep { length >= $minimum_word_length } split /\W+/;
> 
> to :
> 
> push @allwords, grep { length >= $minimum_word_length } split
> /[^a-zA-Z�������������������������������]+/;

Or much better still, dump the old external parser, and switch to an
external converter like conv_doc.pl or doc2html.pl.  There's no reason to
support parse_doc.pl any longer.  It's been hacked too many times by too
many users with too many conflicting needs, and never did give results
that are consistent with the internal parsers.  An external converter
will, because it defers the parsing to the internal parsers.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to