Getting docs that have match on a search phrase is easy (using case-, 
punctuation-, white-space-, insensitive options), and finding docs that have 
the highest frequency for the words in the search phrase is easy 
(cts:word-query and a sequence of terms), but I want to find docs that most 
closely match the search phrase.

For example, if I have a doc that has this text in it: "Mary had a little lamb 
whose fleece was white as snow"

If I search using "mary had a little lamb whose fleece was white as SNOW!!!" a 
cts:word-query would match if I sent the entire phrase and used all the 
"insensitive" options.

If I search by tokenizing the phrase into ("mary", "had", "little", "lamb", 
"fleece", "white", "snow") I will get the doc that has the highest frequency of 
those words (and weighted according to doc size), which may or may not be my 
"Mary had a little lamb doc".

And if I search for "Jane had a little lamb whose fleece was white as snow" the 
Mary doc won't match because the phrase doesn't match, and a tokenized words 
search probably won't match because some other doc with "Jane" and "snow" or 
whatever would be higher priority. I can try to use a near query of all the 
words except "Jane" isn't in the doc so there's be no match for my Mary doc.

What I want is the doc that has a phrase that most closely matches the search 
phrase, even if I drop, replace, or introduce an incorrect word. And I mean 
more than just spelled wrong.

You can see that "Jane had a little lamb whose fleece was white as snow" is 
really close to "Mary had a little lamb whose fleece was white as snow" but I 
can't quite figure out how to get MarkLogic to determine that quickly since the 
phrase won't match and tokenized words won't necessarily give me the best 
relevance. I can get all the permutations of the phrase (every word with all 
the other words in all combinations) and OR them together but search 
performance suffers after just a few permutations.

Anyone know how to do this?

thanks,
-Ryan


                                          
_______________________________________________
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general

Reply via email to