Varun, cts:cluster will group similar documents based on important terms in the documents, including words, element/word pairs and similar. If you build a separate document with only the people, you may be able to group them using cts:cluster, but cluster is intended for moderate sized sets or returned values rather than entire databases. You can also look at cts:similar-query(), again using a document with only people in it.
The cluster and similar functions use the same scores that searching uses - tf-idf scores for terms, which is why if you want it to focus on people you need to put the person elements in a separate document. If you want a more straightforward count of the number of times other people occur in the same document as a given person, you can use cts:element-value-co-occurences or cts:element-values() with a query constraint to a particular person you are checking, then count the number of documents mentioning each other person using cts:frequency on each returned value. Also consider cts:element-value-co-occurences, if you want to focus on the most commonly paired people. Yours, Damon From: [email protected] [mailto:[email protected]] On Behalf Of Varun Varunesh Sent: Monday, May 20, 2013 3:41 PM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] cts:cluster Hi All, I need a quick help. I have not yet explored MarkLogic cts:cluster but my problem sounds more like clustering. So, My problem is I have lots of document in database. Each document contains one or more person name within it. Now I have to create relationship graph of these persons i.e. if some set of persons available in more than a threshold number of documents then connect those names with edges. I am using MarkLogic 5.0. Please suggest your way to solve this problem using MarkLogic. Thanks, Varun Varunesh
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
