Hi Kelly,
Many thanks for the quick reply and info.
I guess that means that the dictionaries are only really suitable for
English words? If I create a French, German or Chinese dictionary
(based on content in those languages) then the Double Metaphone
algorithm isn't going to be as effective - unless the algorithm is
*based* on Double Metaphone, but is more language aware?
Does your reply also imply that there is no basic fuzzy search mechanism
in Mark Logic based on Levenstein Distance?
Many thanks,
-Steve
Kelly Stirman wrote:
The spell correction functionality in MarkLogic employs the Double
Metaphone algorithm:
http://en.wikipedia.org/wiki/Double_Metaphone
This is a more modern and more sophisticated approach to phonetic
matches than soundex.
You can load one of the sample dictionaries on the developer site, your
own, or use the word lexicon of your database to generate a list of
terms that exist across your documents.
Kelly
-----Original Message-----
Hi folks,
I've been looking through the developer docs to try to find out if I can
do fuzzy searching or any type of phonetic searching in XQuery with Mark
Logic.
Does anyone know if there any functions to determine similarities and
distance between strings - e.g. soundex, levenstein, metaphone?
Specifically, I'd like to be able to do lucene-style fuzzy searches
based on levenstein distance (for example, in Lucene, a search for
"roam~" will find words like "foam" and "roams"). The spellcheck module
looks like it does something similar, but I'm not sure what the
implementation is based on? How does it find words from a dictionary
that are spelt similarly to the search term? Is there any developer
control over this?
I'd also like to be able to do phonetic searches, so that, for example,
a search for "fiziks" would match "physics" since they are phonetically
similar. A few relational databases support "soundex" searches, and
SOLR supports the use of various phonetic transcription algorithms. I
guess that I could create an index of phonetic transcriptions during
content load, and do lookups based on that, but it would be good if
there was something I could use 'out-of-the-box'.
Could anyone shed any light on this?
Many thanks,
-Steve
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general