It'd be good if you could share the problematic scenario as a piece of code (ideally a forked Lucene repository, with a test case?) so that we can take a look. There's been a ton of improvements to hunspell packages in Lucene 9 (and on the main branch) - you should take a look and perhaps take some inspiration from existing test cases there?
Dawid On Mon, Feb 13, 2023 at 1:52 PM Thanos Agelakpoulos <agel_tha...@yahoo.gr.invalid> wrote: > Hello, > I'm trying to create a java-wrapper library to lang-detect and then spell > check for the detected languages. I'm currently using Apache Tika as a lang > detector and i'm trying to use lucene.analysis.hunspell package for > spell-checking, as i've i seen it supports many languages.My issue is, i > cant get good accuracy for some languages that have "special" > characters.e.g in sweedish im checking the word bästa, which is classified > as misspelled and the word basta is suggested instead.bästa exists in the > dictionary, so i think this is some encoding issue. > > I'm on windows, w/ lucene 8.11.2. > Im using lucene.analysis.hunspell.Hunspell as a spellchcker > and lucene.analysis.hunspell.Dictionary to create the dicts. > I'm using .dic and .aff files from here. > > > Any guidance on where i should look, or how i should implement to perform > spellchecks would be welcome, as i've hardly found anything :) > > Thanks a lot in advance, > Thanos >