I hadn't, but on your suggestion I tried it and it shows some promise. I want 
to use DB fields which have all different elements, so I'm not sure what I'll 
be able to do with constructing model nodes for the cts:similar query so that 
they can match against several different types of elements. But this is giving 
me a nice "fuzzy" match that I think I can work with. Thanks for the suggestion.

> From: m...@blakeley.com
> Date: Fri, 25 May 2012 11:17:39 -0700
> To: general@developer.marklogic.com
> Subject: Re: [MarkLogic Dev General] searching docs on how close the match    
> the search phrase
> 
> Have you tried similar-query?
> 
>     cts:similar-query(
>       text { 'Mary had a little lamb whose fleece was white as snow' })
> 
> You could set the max-terms option to the count of words in the phrase, or 
> leave it at the default 16 terms. I would leave it alone, at least at first. 
> It should select the 16 "best" terms, which means it will tend to drop 
> stopwords from the query if the text is long. You could also control whether 
> or not the similar-query will use phrase search. I think it could be helpful, 
> but you could try both ways.
> 
> One potential downside is that if there are no good matches, you will 
> probably still match on some stop-words.
> 
> -- Mike
> 
> On 25 May 2012, at 10:07 , seme...@hotmail.com wrote:
> 
> > Getting docs that have match on a search phrase is easy (using case-, 
> > punctuation-, white-space-, insensitive options), and finding docs that 
> > have the highest frequency for the words in the search phrase is easy 
> > (cts:word-query and a sequence of terms), but I want to find docs that most 
> > closely match the search phrase.
> > 
> > For example, if I have a doc that has this text in it: "Mary had a little 
> > lamb whose fleece was white as snow"
> > 
> > If I search using "mary had a little lamb whose fleece was white as 
> > SNOW!!!" a cts:word-query would match if I sent the entire phrase and used 
> > all the "insensitive" options.
> > 
> > If I search by tokenizing the phrase into ("mary", "had", "little", "lamb", 
> > "fleece", "white", "snow") I will get the doc that has the highest 
> > frequency of those words (and weighted according to doc size), which may or 
> > may not be my "Mary had a little lamb doc".
> > 
> > And if I search for "Jane had a little lamb whose fleece was white as snow" 
> > the Mary doc won't match because the phrase doesn't match, and a tokenized 
> > words search probably won't match because some other doc with "Jane" and 
> > "snow" or whatever would be higher priority. I can try to use a near query 
> > of all the words except "Jane" isn't in the doc so there's be no match for 
> > my Mary doc.
> > 
> > What I want is the doc that has a phrase that most closely matches the 
> > search phrase, even if I drop, replace, or introduce an incorrect word. And 
> > I mean more than just spelled wrong.
> > 
> > You can see that "Jane had a little lamb whose fleece was white as snow" is 
> > really close to "Mary had a little lamb whose fleece was white as snow" but 
> > I can't quite figure out how to get MarkLogic to determine that quickly 
> > since the phrase won't match and tokenized words won't necessarily give me 
> > the best relevance. I can get all the permutations of the phrase (every 
> > word with all the other words in all combinations) and OR them together but 
> > search performance suffers after just a few permutations.
> > 
> > Anyone know how to do this?
> > 
> > thanks,
> > -Ryan
> > 
> > 
> > _______________________________________________
> > General mailing list
> > General@developer.marklogic.com
> > http://community.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://community.marklogic.com/mailman/listinfo/general
                                          
_______________________________________________
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general

Reply via email to