Re: How to handle big dictionaries to find typos

Damiano Porta Mon, 14 Sep 2015 05:32:15 -0700

HI Catalin,
than you so much for you help.

Yes I found Lucene's FuzzyQuery, but i did not understand one passage. When
I check the term (with typos) against a Lucene Index to find the correct
form, why do I have to use DictionaryNameFinder? I mean..


1. I can create an index with all the correct names
2. CHecking each token against that index to find a match or a word (with a
specific "distance")
3. If I found something i "tag" that word as city without using
DictionaryNameFinder.

I mean, my "dictionary" will be this Lucene's index.
Correct?

Thank you!
Damiano



2015-09-14 13:10 GMT+02:00 Cătălin M. <[email protected]>:

> A solution might be to check typos (Gogle, Gooogle) against a Lucene index
> that would contain your dictionary of companies, too. Using the FuzzyQuery
> you would find the correct form => "Google" and then use this correct orm
> in your DictionaryNameFinder.
>
> Please let me know if it seems feasible.
>
> BR,
> Catalin
>
>
>
> On 09/13/2015 10:35 PM, Damiano Porta wrote:
>
>> Hi Catalin,
>> Can i use it with DictionaryNameFinder?
>> Thanks
>> Damiano
>>
>> Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu <
>> [email protected]>
>> ha scritto:
>>
>> Hi Damiano,
>>>
>>> You may try Lucene fuzzy query which is based on Levenstein distance.
>>>
>>> BR,
>>> Catalin
>>>
>>> On 09/13/2015 09:59 PM, Damiano Porta wrote:
>>>
>>>> Hello,
>>>>
>>>> I have created a very big dictionary of companies, it is around 3M.
>>>> At the moment i am using DictionaryNameFinder class, but I need to
>>>> implement something to find typos like Gogle/Gooogle Inc etc.
>>>> I read something about leveinstain distance, is this implementend in
>>>> OpenNLP?
>>>> It seems good but i read it takes a lot of times if the words are many
>>>>
>>> (my
>>>
>>>> case).
>>>>
>>>> What should i do?
>>>> Thanks!
>>>> Damiano
>>>>
>>>>
>>>
>

Re: How to handle big dictionaries to find typos

Reply via email to