A solution might be to check typos (Gogle, Gooogle) against a Lucene index that would contain your dictionary of companies, too. Using the FuzzyQuery you would find the correct form => "Google" and then use this correct orm in your DictionaryNameFinder.

Please let me know if it seems feasible.

BR,
Catalin


On 09/13/2015 10:35 PM, Damiano Porta wrote:
Hi Catalin,
Can i use it with DictionaryNameFinder?
Thanks
Damiano

Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu <[email protected]>
ha scritto:

Hi Damiano,

You may try Lucene fuzzy query which is based on Levenstein distance.

BR,
Catalin

On 09/13/2015 09:59 PM, Damiano Porta wrote:
Hello,

I have created a very big dictionary of companies, it is around 3M.
At the moment i am using DictionaryNameFinder class, but I need to
implement something to find typos like Gogle/Gooogle Inc etc.
I read something about leveinstain distance, is this implementend in
OpenNLP?
It seems good but i read it takes a lot of times if the words are many
(my
case).

What should i do?
Thanks!
Damiano



Reply via email to