I haven't experimented with cts:similar-query before and it seems that using different numbers in the max-terms option greatly affects the results. I've not changed any of the options for DB settings, so I'm using the default DB settings. I notice that the default is 16 for max-terms. I've used the cts:distinctive-terms to try to get a feel for what cts:similar-query will use when I change the number of max-terms. I originally thought that I'd simply take the number of terms (i.e., tokenize on space) in the $node, then I thought maybe I should double that to take into account the pairs of terms. Is there any "rule of thumb" here? (BTW, I'm doing this with 3 different DBs, for which the fragment counts are 24M, 131M and 287M, so I have plenty of fragments for similar-query to work on...)
A second question is with regards to cts:distinctive-terms output - what does an empty cts:term mean? <cts:term id="4083217226504034818" val="504" score="1032192" confidence="0.453548" fitness="0" xmlns:cts="http://marklogic.com/cts"></cts:term> It'd be nice to know what this "term" is since it's the highest scoring term in the list... Thanks, David
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general