I hadn't, but on your suggestion I tried it and it shows some promise. I want to use DB fields which have all different elements, so I'm not sure what I'll be able to do with constructing model nodes for the cts:similar query so that they can match against several different types of elements. But this is giving me a nice "fuzzy" match that I think I can work with. Thanks for the suggestion.
> From: m...@blakeley.com > Date: Fri, 25 May 2012 11:17:39 -0700 > To: general@developer.marklogic.com > Subject: Re: [MarkLogic Dev General] searching docs on how close the match > the search phrase > > Have you tried similar-query? > > cts:similar-query( > text { 'Mary had a little lamb whose fleece was white as snow' }) > > You could set the max-terms option to the count of words in the phrase, or > leave it at the default 16 terms. I would leave it alone, at least at first. > It should select the 16 "best" terms, which means it will tend to drop > stopwords from the query if the text is long. You could also control whether > or not the similar-query will use phrase search. I think it could be helpful, > but you could try both ways. > > One potential downside is that if there are no good matches, you will > probably still match on some stop-words. > > -- Mike > > On 25 May 2012, at 10:07 , seme...@hotmail.com wrote: > > > Getting docs that have match on a search phrase is easy (using case-, > > punctuation-, white-space-, insensitive options), and finding docs that > > have the highest frequency for the words in the search phrase is easy > > (cts:word-query and a sequence of terms), but I want to find docs that most > > closely match the search phrase. > > > > For example, if I have a doc that has this text in it: "Mary had a little > > lamb whose fleece was white as snow" > > > > If I search using "mary had a little lamb whose fleece was white as > > SNOW!!!" a cts:word-query would match if I sent the entire phrase and used > > all the "insensitive" options. > > > > If I search by tokenizing the phrase into ("mary", "had", "little", "lamb", > > "fleece", "white", "snow") I will get the doc that has the highest > > frequency of those words (and weighted according to doc size), which may or > > may not be my "Mary had a little lamb doc". > > > > And if I search for "Jane had a little lamb whose fleece was white as snow" > > the Mary doc won't match because the phrase doesn't match, and a tokenized > > words search probably won't match because some other doc with "Jane" and > > "snow" or whatever would be higher priority. I can try to use a near query > > of all the words except "Jane" isn't in the doc so there's be no match for > > my Mary doc. > > > > What I want is the doc that has a phrase that most closely matches the > > search phrase, even if I drop, replace, or introduce an incorrect word. And > > I mean more than just spelled wrong. > > > > You can see that "Jane had a little lamb whose fleece was white as snow" is > > really close to "Mary had a little lamb whose fleece was white as snow" but > > I can't quite figure out how to get MarkLogic to determine that quickly > > since the phrase won't match and tokenized words won't necessarily give me > > the best relevance. I can get all the permutations of the phrase (every > > word with all the other words in all combinations) and OR them together but > > search performance suffers after just a few permutations. > > > > Anyone know how to do this? > > > > thanks, > > -Ryan > > > > > > _______________________________________________ > > General mailing list > > General@developer.marklogic.com > > http://community.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://community.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general