Hi Mike,
The calculations are based on the number of fragments in the database, so yes, your search scores will be affected by duplicate content in /create and /preview. Relevance is calculated (by default) using the log-tf*idf formula (log of term frequency times the inverse document frequency for the search matches). Relevance will only affect the order in which things are returned from a search; it will not affect what documents match a search. Whether these extra copies of documents will really change the relevance order depends on the number of documents you have in the database. If you have a relatively small number of documents in the /create and /preview directories compared to the number of documents in the database, then it is not likely to change the relevance order. If the proportion of duplicate documents is statistically significant, then it might have a material impact. For most databases and most real-world searches, I don't think it will affect your results very much. If you find it is affecting the relevance order, then putting them in separate databases is the right approach. My feeling is this will not be necessary, but it will depend on your database size, your content, and your searches. Hope that helps, -Danny From: [email protected] [mailto:[email protected]] On Behalf Of Mike Bowers Sent: Monday, January 26, 2009 5:32 PM To: [email protected] Subject: [MarkLogic Dev General] What is the Scope of Search Relevance inMarkLogic? Is search relevance based on all documents in a database or only the documents included in the scope of a search? For example, assume I have three folders in the same database: /create, /preview and /publish. /create contains multiple versions of each document. /preview contains copies of documents in /create that a user is considering displaying on a website. /publish contains copies of documents in /preview that a user wants to display on a website. Thus, /publish contains a copy of some of the documents in /preview and /preview contains a copy of some of the documents in /create and /create contains multiple versions of each document. If I use XPath to limit a search to include only those documents in /publish, will search relevance be affected by duplicate documents in /create and /preview? Similarly, if a user does a search for documents in all three folders but the user only has permissions to see documents in /publish, will search relevance be affected by duplicate documents in /create and /preview? Should we create separate databases for /create, /preview, and /publish to ensure that duplicate documents do not affect search relevance? NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
