+1 for N-Gram support.  Definitely a 'must have' in my book.

Rob

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Alan Darnell 
[[email protected]]
Sent: Thursday, February 04, 2016 5:11 PM
To: General Developer Discussion
Subject: Re: [MarkLogic Dev General] Getting pairs or triples of words that 
appear frequently together ?

Putting in a supporting plug for N-gram support in ML.  This would be a great 
feature for text-mining applications.

Alan

On Feb 4, 2016, at 4:28 PM, Geert Josten 
<[email protected]<mailto:[email protected]>> wrote:

Hi Danny,

Word lexicons don’t expose frequency counts, and there is no word-tuples 
either. Your best bet currently is to use cts:distinctive-terms and 
cts:highlight at ingest to mark important terms, and then put a range index on 
that, so you can get frequencies and tuples that way. One downside is though 
that you rule out relevance scoring, so stop words dominate..

Cheers,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Danny Sinang <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Thursday, February 4, 2016 at 9:48 PM
To: general 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] Getting pairs or triples of words that appear 
frequently together ?

I've got one element with a paragraph of text.

I want to surface words that frequently appear together in the blob of text.

I can get the individual words easily using cts:element-words, but how do I get 
pairs or triples of words that appear frequently together ?

Regards,
Danny

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

This email message is a private communication.  The information transmitted, 
including attachments, is intended only for the person or entity to which it is 
addressed and may contain confidential, privileged, and/or proprietary 
material.  Any review, duplication, retransmission, distribution, or other use 
of, or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is unauthorized by the sender and is 
prohibited.  If you have received this message in error, please contact the 
sender immediately by return email and delete the original message from all 
computer systems.  Thank you.
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to