Hello,
I'm trying to create a java-wrapper library to lang-detect and then spell check 
for the detected languages. I'm currently using Apache Tika as a lang detector 
and i'm trying to use lucene.analysis.hunspell package for spell-checking, as 
i've i seen it supports many languages.My issue is, i cant get good accuracy 
for some languages that have "special" characters.e.g in sweedish im checking 
the word bästa, which is classified as misspelled and the word basta is 
suggested instead.bästa exists in the dictionary, so i think this is some 
encoding issue.

I'm on windows, w/ lucene 8.11.2.
Im using lucene.analysis.hunspell.Hunspell as a spellchcker
and lucene.analysis.hunspell.Dictionary to create the dicts.
I'm using .dic and .aff files from here.


Any guidance on where i should look, or how i should implement to perform 
spellchecks would be welcome, as i've hardly found anything :)

Thanks a lot in advance,
Thanos

Reply via email to