Re: request for Input or ideas.... EntityLinker tickets

Jörn Kottmann Tue, 22 Oct 2013 11:40:15 -0700

On 10/05/2013 11:58 PM, Mark G wrote:

3. fuzzy string matching should be part of the scoring, this would allow
mysql fuzzy search to return more candidate toponyms.


Currently, the search into the MySQL gazateers is using "boolean mode" and
each NER result is passed in as a literal string. If I implement a fuzzy
string matching based score (do we have one?) the user could turn on
"natural language" mode in MySQL then we can generate a score and thresh to
allow for more recall on transliterated names etc....
I would also like to use proximity to the majority of points in the
document as a disambiguation criteria as well.

It would probably be nice if this would work with other databases too,e.g. Apache Derby,

or some in-memory database, maybe even Lucene.

Would it be possible to not use the MySQL fuzzy string matching featurefor this?

I would like to run your code, but its difficult to scale the MySQLdatabase in my scenario,but I have lots of RAM and believe the geonames dataset could fit intoit to provide

super fast lookups for me on my worker servers.

Jörn

Re: request for Input or ideas.... EntityLinker tickets

Reply via email to