+1 for N-Gram support. Definitely a 'must have' in my book. Rob
________________________________________ From: [email protected] [[email protected]] On Behalf Of Alan Darnell [[email protected]] Sent: Thursday, February 04, 2016 5:11 PM To: General Developer Discussion Subject: Re: [MarkLogic Dev General] Getting pairs or triples of words that appear frequently together ? Putting in a supporting plug for N-gram support in ML. This would be a great feature for text-mining applications. Alan On Feb 4, 2016, at 4:28 PM, Geert Josten <[email protected]<mailto:[email protected]>> wrote: Hi Danny, Word lexicons don’t expose frequency counts, and there is no word-tuples either. Your best bet currently is to use cts:distinctive-terms and cts:highlight at ingest to mark important terms, and then put a range index on that, so you can get frequencies and tuples that way. One downside is though that you rule out relevance scoring, so stop words dominate.. Cheers, Geert From: <[email protected]<mailto:[email protected]>> on behalf of Danny Sinang <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Thursday, February 4, 2016 at 9:48 PM To: general <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] Getting pairs or triples of words that appear frequently together ? I've got one element with a paragraph of text. I want to surface words that frequently appear together in the blob of text. I can get the individual words easily using cts:element-words, but how do I get pairs or triples of words that appear frequently together ? Regards, Danny _______________________________________________ General mailing list [email protected]<mailto:[email protected]> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
