Gilles Detillieux wrote:
>
> According to Radoy Pavlov:
> > Gilles Detillieux wrote:
> > > By "not applying regexp", I assume you're talking about "htfuzzy endings"
> > > here? Did you try running that command manually with one or more -v
> > > options to see if that gives you any usable feedback? It may be that
> > > your polish.aff file is not in a format that htfuzzy can deal with.
> > >
> > Thanks for your answer. Exactly my thought.
> >
> > root@box:/usr/local/sbin/bin# ./htfuzzy -vv -c
> > /usr/local/share/apache/conf/htdig_pl.conf endings
> > htfuzzy: Selected algorithm: endings
> > htfuzzy/endings: Reading rules
> > htfuzzy/endings: Creating databases
> > htfuzzy: Done.
> >
> > The question is, where can I find a suitable affix file or how
> > can i "patch" it. I've menaged to index my site in 6 different
> > languages, still polish doesn't work. Any ideas ?
>
> By "polish doesn't work", do you mean you're having problems indexing
> the Polish text with htdig, as well as having problems with building
> the Polish endings database? The endings database is only used for
> the "endings" fuzzy match algorithm, and isn't absolutely essential.
>
Yes, I'm having problems indexing Polish text and I can't build endings
database aswell. I'm running: htdig -i -vvv -c /path/to/htdig_pl.conf
The robot goes thru all Polish pages. No database.
>From my conf file:
locale: pl_PL.ISO_8859-2
lang_dir: ${common_dir}/polish
# bad_word_list: ${lang_dir}/bad_words
endings_affix_file: ${lang_dir}/polish.aff
endings_dictionary: ${lang_dir}/polish.0
endings_root2word_db: ${lang_dir}/root2word.db
endings_word2root_db: ${lang_dir}/word2root.db
> If polish accented letters aren't indexed properly, then it may be
> because the pl_PL locale on your system doesn't define a proper
> LC_CTYPE file for the ISO-8859-2 character set. If they are indexed
> properly, you could simply take the endings algorithm out of your
> search_algorithms attribute setting until you manage to build your
> endings database.
ls -al /usr/share/locale/ | grep pl_
drwxr-xr-x 2 root wheel 512 Feb 22 2000 pl_PL.ISO_8859-2
in /usr/share/locale/pl_PL.ISO_8859-2
lrwxrwxrwx 1 root wheel 30 Feb 22 2000 LC_COLLATE ->
../lt_LN.ISO_8859-2/LC_COLLATE
lrwxrwxrwx 1 root wheel 28 Feb 22 2000 LC_CTYPE ->
../lt_LN.ISO_8859-2/LC_CTYPE
-rw-r--r-- 1 root wheel 285 Dec 28 1999 LC_TIME
That's ok, isn't it ?
>
> With htfuzzy -vv, you should be getting much more output than that.
> Is there anything in your polish.0 file? You should get a message
> for each word processed from that file.
The output is just the same. I can see rich output for any of the other
6 languages on my site. Everything is just fine, not with Polish.
cat polish.0 | wc -l
52038
cat polish.aff | wc -l
4735
> BTW, it's a good idea to keep your replies on the htdig-general
> mailing list, in case others can help out. My experience with locales
> is limited, and with Eastern European languages, nil.
I'm sorry about not posting on the mail list. My fault.
Thanks for any advise,
Radoy
--
Error FE6B - Nonexistent - This comment does not exist,
and therefore you cannot read it. Please go away quietly.
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html