czech support

Leos Literak Thu, 05 Sep 2002 04:17:46 -0700

Hi,

I use lucene on my website and i am enthusiastic
about it. I was able to enable fulltext for my
complicated database driven site within few hours!


But I need some enhancements, that are related to
my language.

1\ in Czech, there are variants of words.
in english, you have only two shapes:

singular: dog
plural: dogs

but we have 7 shapes for each of singular
and plural:

pes,psa,psovi,psu,pse,psovi,psem
psi,psu,psum,psy,psi,psech,psy

I'd like to be able to search for all of this
variants in first shape: pes

eg. whenever I encounter "psa" index it as "pes"

2\ another issue is with diacritics.

for example lávka (la'vka)

some people use it, some write words without it.

so I'd like to be able to look up both
lavka and lávka. easy way is to index words without
diacritics, because it is common denominator.

the algorithm is quite simple.



so my question is: what kind of interfaces/classes
shall I implement/overwrite? i have no idea of
relations between classes in Lucene and their
purpose. so it would take me lot of time to find that.

thank you for your ideas!

        Leos


-- 
Leos Literak
http://AbcLinuxu.cz - tady je tucnakum hej!




--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

czech support

Reply via email to