Yes Catalin, I was using DictionaryNameFinder for NER. But unfortunately it does not support misspellings at the moment. So i have to migrate that dictionary to a Lucene Index.
Thank you! 2015-09-14 14:46 GMT+02:00 Cătălin M. <[email protected]>: > Yes, you have right. You can replace DictionaryNameFinder with a Lucene > index. When you mentioned DictionaryNameFinder I was thinking at Name > entity recognition module (tagging being done using a NER model). > > Sorry for this misunderstanding. > > BR, > Catalin > > > On 09/14/2015 03:31 PM, Damiano Porta wrote: > >> HI Catalin, >> than you so much for you help. >> >> Yes I found Lucene's FuzzyQuery, but i did not understand one passage. >> When >> I check the term (with typos) against a Lucene Index to find the correct >> form, why do I have to use DictionaryNameFinder? I mean.. >> >> 1. I can create an index with all the correct names >> 2. CHecking each token against that index to find a match or a word (with >> a >> specific "distance") >> 3. If I found something i "tag" that word as city without using >> DictionaryNameFinder. >> >> I mean, my "dictionary" will be this Lucene's index. >> Correct? >> >> Thank you! >> Damiano >> >> >> >> 2015-09-14 13:10 GMT+02:00 Cătălin M. <[email protected]>: >> >> A solution might be to check typos (Gogle, Gooogle) against a Lucene index >>> that would contain your dictionary of companies, too. Using the >>> FuzzyQuery >>> you would find the correct form => "Google" and then use this correct orm >>> in your DictionaryNameFinder. >>> >>> Please let me know if it seems feasible. >>> >>> BR, >>> Catalin >>> >>> >>> >>> On 09/13/2015 10:35 PM, Damiano Porta wrote: >>> >>> Hi Catalin, >>>> Can i use it with DictionaryNameFinder? >>>> Thanks >>>> Damiano >>>> >>>> Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu < >>>> [email protected]> >>>> ha scritto: >>>> >>>> Hi Damiano, >>>> >>>>> You may try Lucene fuzzy query which is based on Levenstein distance. >>>>> >>>>> BR, >>>>> Catalin >>>>> >>>>> On 09/13/2015 09:59 PM, Damiano Porta wrote: >>>>> >>>>> Hello, >>>>>> >>>>>> I have created a very big dictionary of companies, it is around 3M. >>>>>> At the moment i am using DictionaryNameFinder class, but I need to >>>>>> implement something to find typos like Gogle/Gooogle Inc etc. >>>>>> I read something about leveinstain distance, is this implementend in >>>>>> OpenNLP? >>>>>> It seems good but i read it takes a lot of times if the words are many >>>>>> >>>>>> (my >>>>> >>>>> case). >>>>>> >>>>>> What should i do? >>>>>> Thanks! >>>>>> Damiano >>>>>> >>>>>> >>>>>> >
