On 10/05/2013 11:58 PM, Mark G wrote:
3. fuzzy string matching should be part of the scoring, this would allow
mysql fuzzy search to return more candidate toponyms.
Currently, the search into the MySQL gazateers is using "boolean mode" and
each NER result is passed in as a literal string. If I implement a fuzzy
string matching based score (do we have one?) the user could turn on
"natural language" mode in MySQL then we can generate a score and thresh to
allow for more recall on transliterated names etc....
I would also like to use proximity to the majority of points in the
document as a disambiguation criteria as well.
It would probably be nice if this would work with other databases too,
e.g. Apache Derby,
or some in-memory database, maybe even Lucene.
Would it be possible to not use the MySQL fuzzy string matching feature
for this?
I would like to run your code, but its difficult to scale the MySQL
database in my scenario,
but I have lots of RAM and believe the geonames dataset could fit into
it to provide
super fast lookups for me on my worker servers.
Jörn