Martin,

Is there a probability value that is returned for the matching
string(s)? I actually just came across a blog post[1] that does
something similar to what you are working towards. They use the
verbiage "best partial" for determining strings of noticeably
different lengths. This appears to be similar to using a Jaccard
index[2] for string comparison but on smaller bodies of text like the
titles of said aliases. Would this be an application for using a
Lucene index that already has all the info retrieval goodness built in
to it?

Adam

[1] http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
[2] http://en.wikipedia.org/wiki/Jaccard_index



On Tue, Nov 26, 2013 at 4:11 PM, Martin Desruisseaux
<[email protected]> wrote:
> Le 25/11/13 23:51, Martin Desruisseaux a écrit :
>
>> I would like a better method name for "nameMatches". If possible, I would
>> like something that contains the word "heuristic" or "lenient" in it, or
>> anything else which said the heuristic nature of this method. Does anyone
>> have suggestions? I do not know if "nameMatchesHeuristically" or
>> "heuristicNameMatches" would be correct English.
>
>
> I'm trying "isHeuristicMatchForName(String)" [1]. A search on internet found
> a few hits for "is heuristic match". If anyone has other idea, please let us
> known.
>
> This particular method may need to be revisited as we try to handle data
> from a larger range of data producers, so I think it is worth to make its
> purpose easy to spot.
>
>     Martin
>
>
> [1]
> https://builds.apache.org/job/sis-jdk7/site/apidocs/org/apache/sis/referencing/AbstractIdentifiedObject.html#isHeuristicMatchForName%28java.lang.String%29
>

Reply via email to