A solution might be to check typos (Gogle, Gooogle) against a Lucene index that would contain your dictionary of companies, too. Using the FuzzyQuery you would find the correct form => "Google" and
then use this correct orm in your DictionaryNameFinder.
Please let me know if it seems feasible.
BR,
Catalin
On 09/13/2015 10:35 PM, Damiano Porta wrote:
Hi Catalin,
Can i use it with DictionaryNameFinder?
Thanks
Damiano
Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu <[email protected]>
ha scritto:
Hi Damiano,
You may try Lucene fuzzy query which is based on Levenstein distance.
BR,
Catalin
On 09/13/2015 09:59 PM, Damiano Porta wrote:
Hello,
I have created a very big dictionary of companies, it is around 3M.
At the moment i am using DictionaryNameFinder class, but I need to
implement something to find typos like Gogle/Gooogle Inc etc.
I read something about leveinstain distance, is this implementend in
OpenNLP?
It seems good but i read it takes a lot of times if the words are many
(my
case).
What should i do?
Thanks!
Damiano