It'd be good if you could share the problematic scenario as a piece of code
(ideally a forked Lucene repository, with a test case?) so that we can take
a look. There's been a ton of improvements to hunspell packages in Lucene 9
(and on the main branch) - you should take a look and perhaps take some
inspiration from existing test cases there?

Dawid

On Mon, Feb 13, 2023 at 1:52 PM Thanos Agelakpoulos
<agel_tha...@yahoo.gr.invalid> wrote:

> Hello,
> I'm trying to create a java-wrapper library to lang-detect and then spell
> check for the detected languages. I'm currently using Apache Tika as a lang
> detector and i'm trying to use lucene.analysis.hunspell package for
> spell-checking, as i've i seen it supports many languages.My issue is, i
> cant get good accuracy for some languages that have "special"
> characters.e.g in sweedish im checking the word bästa, which is classified
> as misspelled and the word basta is suggested instead.bästa exists in the
> dictionary, so i think this is some encoding issue.
>
> I'm on windows, w/ lucene 8.11.2.
> Im using lucene.analysis.hunspell.Hunspell as a spellchcker
> and lucene.analysis.hunspell.Dictionary to create the dicts.
> I'm using .dic and .aff files from here.
>
>
> Any guidance on where i should look, or how i should implement to perform
> spellchecks would be welcome, as i've hardly found anything :)
>
> Thanks a lot in advance,
> Thanos
>

Reply via email to