Yes, you have right. You can replace DictionaryNameFinder with a Lucene index. When you mentioned DictionaryNameFinder I was thinking at Name entity recognition module (tagging being done using a NER
model).
Sorry for this misunderstanding.
BR,
Catalin
On 09/14/2015 03:31 PM, Damiano Porta wrote:
HI Catalin,
than you so much for you help.
Yes I found Lucene's FuzzyQuery, but i did not understand one passage. When
I check the term (with typos) against a Lucene Index to find the correct
form, why do I have to use DictionaryNameFinder? I mean..
1. I can create an index with all the correct names
2. CHecking each token against that index to find a match or a word (with a
specific "distance")
3. If I found something i "tag" that word as city without using
DictionaryNameFinder.
I mean, my "dictionary" will be this Lucene's index.
Correct?
Thank you!
Damiano
2015-09-14 13:10 GMT+02:00 Cătălin M. <[email protected]>:
A solution might be to check typos (Gogle, Gooogle) against a Lucene index
that would contain your dictionary of companies, too. Using the FuzzyQuery
you would find the correct form => "Google" and then use this correct orm
in your DictionaryNameFinder.
Please let me know if it seems feasible.
BR,
Catalin
On 09/13/2015 10:35 PM, Damiano Porta wrote:
Hi Catalin,
Can i use it with DictionaryNameFinder?
Thanks
Damiano
Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu <
[email protected]>
ha scritto:
Hi Damiano,
You may try Lucene fuzzy query which is based on Levenstein distance.
BR,
Catalin
On 09/13/2015 09:59 PM, Damiano Porta wrote:
Hello,
I have created a very big dictionary of companies, it is around 3M.
At the moment i am using DictionaryNameFinder class, but I need to
implement something to find typos like Gogle/Gooogle Inc etc.
I read something about leveinstain distance, is this implementend in
OpenNLP?
It seems good but i read it takes a lot of times if the words are many
(my
case).
What should i do?
Thanks!
Damiano