Re: [MarkLogic Dev General] Getting pairs or triples of words that appear frequently together ?

Alan Darnell Thu, 04 Feb 2016 14:12:07 -0800

Putting in a supporting plug for N-gram support in ML.  This would be a great 
feature for text-mining applications.


Alan

On Feb 4, 2016, at 4:28 PM, Geert Josten 
<[email protected]<mailto:[email protected]>> wrote:

Hi Danny,

Word lexicons don’t expose frequency counts, and there is no word-tuples 
either. Your best bet currently is to use cts:distinctive-terms and 
cts:highlight at ingest to mark important terms, and then put a range index on 
that, so you can get frequencies and tuples that way. One downside is though 
that you rule out relevance scoring, so stop words dominate..

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Danny Sinang <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Thursday, February 4, 2016 at 9:48 PM
To: general 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] Getting pairs or triples of words that appear 
frequently together ?

I've got one element with a paragraph of text.

I want to surface words that frequently appear together in the blob of text.

I can get the individual words easily using cts:element-words, but how do I get 
pairs or triples of words that appear frequently together ?

Regards,
Danny

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Getting pairs or triples of words that appear frequently together ?

Reply via email to