Varun,

cts:cluster will group similar documents based on important terms in the 
documents, including words, element/word pairs and similar. If you build a 
separate document with only the people, you may be able to group them using 
cts:cluster, but cluster is intended for moderate sized sets or returned values 
rather than entire databases. You can also look at cts:similar-query(), again 
using a document with only people in it.

The cluster and similar functions use the same scores that searching uses - 
tf-idf scores for terms, which is why if you want it to focus on people you 
need to put the person elements in a separate document. If you want a more 
straightforward count of the number of times other people occur in the same 
document as a given person, you can use cts:element-value-co-occurences or 
cts:element-values() with a query constraint to a particular person you are 
checking, then count the number of documents mentioning each other person using 
cts:frequency on each returned value.

Also consider cts:element-value-co-occurences, if you want to focus on the most 
commonly paired people.

Yours,
Damon

From: [email protected] 
[mailto:[email protected]] On Behalf Of Varun Varunesh
Sent: Monday, May 20, 2013 3:41 PM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] cts:cluster

Hi All,

I need a quick help. I have not yet explored MarkLogic cts:cluster but my 
problem sounds more like clustering.

So, My problem is I have lots of document in database. Each document contains 
one or more person name within it. Now I have to create relationship graph of 
these persons i.e. if some set of persons available in more than a threshold 
number of documents then connect those names with edges.

I am using MarkLogic 5.0.

Please suggest your way to solve this problem using MarkLogic.

Thanks,
Varun  Varunesh
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to