HI Catalin, than you so much for you help. Yes I found Lucene's FuzzyQuery, but i did not understand one passage. When I check the term (with typos) against a Lucene Index to find the correct form, why do I have to use DictionaryNameFinder? I mean..
1. I can create an index with all the correct names 2. CHecking each token against that index to find a match or a word (with a specific "distance") 3. If I found something i "tag" that word as city without using DictionaryNameFinder. I mean, my "dictionary" will be this Lucene's index. Correct? Thank you! Damiano 2015-09-14 13:10 GMT+02:00 Cătălin M. <[email protected]>: > A solution might be to check typos (Gogle, Gooogle) against a Lucene index > that would contain your dictionary of companies, too. Using the FuzzyQuery > you would find the correct form => "Google" and then use this correct orm > in your DictionaryNameFinder. > > Please let me know if it seems feasible. > > BR, > Catalin > > > > On 09/13/2015 10:35 PM, Damiano Porta wrote: > >> Hi Catalin, >> Can i use it with DictionaryNameFinder? >> Thanks >> Damiano >> >> Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu < >> [email protected]> >> ha scritto: >> >> Hi Damiano, >>> >>> You may try Lucene fuzzy query which is based on Levenstein distance. >>> >>> BR, >>> Catalin >>> >>> On 09/13/2015 09:59 PM, Damiano Porta wrote: >>> >>>> Hello, >>>> >>>> I have created a very big dictionary of companies, it is around 3M. >>>> At the moment i am using DictionaryNameFinder class, but I need to >>>> implement something to find typos like Gogle/Gooogle Inc etc. >>>> I read something about leveinstain distance, is this implementend in >>>> OpenNLP? >>>> It seems good but i read it takes a lot of times if the words are many >>>> >>> (my >>> >>>> case). >>>> >>>> What should i do? >>>> Thanks! >>>> Damiano >>>> >>>> >>> >
