Putting in a supporting plug for N-gram support in ML. This would be a great feature for text-mining applications.
Alan On Feb 4, 2016, at 4:28 PM, Geert Josten <[email protected]<mailto:[email protected]>> wrote: Hi Danny, Word lexicons don’t expose frequency counts, and there is no word-tuples either. Your best bet currently is to use cts:distinctive-terms and cts:highlight at ingest to mark important terms, and then put a range index on that, so you can get frequencies and tuples that way. One downside is though that you rule out relevance scoring, so stop words dominate.. Cheers, Geert From: <[email protected]<mailto:[email protected]>> on behalf of Danny Sinang <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Thursday, February 4, 2016 at 9:48 PM To: general <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] Getting pairs or triples of words that appear frequently together ? I've got one element with a paragraph of text. I want to surface words that frequently appear together in the blob of text. I can get the individual words easily using cts:element-words, but how do I get pairs or triples of words that appear frequently together ? Regards, Danny _______________________________________________ General mailing list [email protected]<mailto:[email protected]> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
