hi I need to compare two Base64 representation strings of some MIME content
that I am storing within a Lucene index. I need to efficiently compare them to
find the closest match to a query Base64 string , post Lucene query.
I am not sure of the best way to approach this, could I compare the hashes and
compute their similarity? Levenshtein distance seems hard because of the size
of ths strings and seems inefficient? Is there any other method you could
suggest?
n.b: The idea is to not to determine exact match or not, it is to compute a
similarity metric. for example
John & Johnson (closer)
vs,
John & Jimmy (farther)
tia,
Aaron
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]