1.3 sec running over 150k words, not bad at all!
> -----Oorspronkelijk bericht----- > Van: Geert Josten [mailto:[email protected]] > Verzonden: zondag 13 mei 2012 10:32 > Aan: MarkLogic Developer Discussion > Onderwerp: RE: [MarkLogic Dev General] Bug in cts:element-words? (was: Term > with same stem) > > Duh.. It just had to be something that obvious.. > > Thnx Danny! > > > -----Oorspronkelijk bericht----- > > Van: [email protected] [mailto:general- > > [email protected]] Namens Danny Sokolsky > > Verzonden: zondag 13 mei 2012 0:42 > > Aan: MarkLogic Developer Discussion > > Onderwerp: Re: [MarkLogic Dev General] Bug in cts:element-words? (was: > Term > > with same stem) > > > > I hadn't had enough coffee yet when I made my last comment. The example > in > > the doc is correct, it just puts a start value in. Geert, your example > would use > > the "collation=..." string as the start value, and would pick up the > whatever is the > > default collation in your environment (and you probably do not have an > element > > word lexicon on the default collation, so it probably throws an > exception). > > > > -Danny > > ________________________________________ > > From: [email protected] [general- > > [email protected]] On Behalf Of Danny Sokolsky > > [[email protected]] > > Sent: Saturday, May 12, 2012 10:38 AM > > To: MarkLogic Developer Discussion > > Subject: Re: [MarkLogic Dev General] Bug in cts:element-words? (was: > Term > > with same stem) > > > > I think your call to element-words is missing the second parameter; > $options is > > the 3rd parameter. So I think it should be: > > > > cts:element-words(fn:QName("http://grtjn.nl/twitter/utils", "text"), (), > > "collation=http://marklogic.com/collation/nl/S1/AS/T00BB") > > > > It looks like the example in the doc is missing that second arg > too--I'll see if I can > > get that fixed ;) > > > > -Danny > > > > ________________________________________ > > From: [email protected] [general- > > [email protected]] On Behalf Of Geert Josten > > [[email protected]] > > Sent: Saturday, May 12, 2012 8:52 AM > > To: MarkLogic Developer Discussion > > Subject: [MarkLogic Dev General] Bug in cts:element-words? (was: Term > with > > same stem) > > > > Curious how well the idea of Danny would perform, I thought to apply it > to one > > of my test databases with a fair number of tweets (roughly 400K last > time I > > checked). I had to rewrite cts:words to cts:element-words since I have > no words > > lexicon. But it breaks with me. Did I hit a bug? > > > > let $map := map:map() > > let $all := > > for $x in cts:element-words(fn:QName("http://grtjn.nl/twitter/utils", > "text"), > > "collation=http://marklogic.com/collation/nl/S1/AS/T00BB") > > return map:put($map, cts:stem($x), $x) > > return ( > > fn:concat(xs:string(fn:count(map:keys($map))), " unique stems in the > database"), > > fn:concat(fn:count(cts:words()), " unique words in the database > > "), > > map:keys($map) ) > > > > Note that I specify a specific collation, but that seems to get ignored. > Can > > anyone confirm this behavior? > > > > Kind regards, > > Geert > > > > Van: [email protected]<mailto:general- > > [email protected]> [mailto:general- > > [email protected]<mailto:general- > > [email protected]>] Namens Danny Sokolsky > > Verzonden: zaterdag 12 mei 2012 0:13 > > Aan: MarkLogic Developer Discussion > > Onderwerp: Re: [MarkLogic Dev General] Term with same stem > > > > If you have a word lexicon you can do something like this to get > information > > about your words and stems: > > > > let $map := map:map() > > let $all := > > for $x in cts:words() > > return map:put($map, cts:stem($x), $x) > > return ( > > fn:concat(xs:string(fn:count(map:keys($map))), " unique stems in the > database"), > > fn:concat(fn:count(cts:words()), " unique words in the database > > "), > > map:keys($map) ) > > > > -Danny > > > > From: [email protected]<mailto:general- > > [email protected]> [mailto:general- > > [email protected]]<mailto:[mailto:general- > > [email protected]]> On Behalf Of Michael Blakeley > > Sent: Friday, May 11, 2012 2:02 PM > > To: MarkLogic Developer Discussion > > Cc: MarkLogic Developer Discussion > > Subject: Re: [MarkLogic Dev General] Term with same stem > > > > If stemming=advanced I think cts:stem will do that. With basic the best > you can > > do is to pass terms to cts:stem and see if they have the same stem. > > -- Mike > > > > On May 11, 2012, at 13:39, Abhishek53 S > > <[email protected]<mailto:[email protected]>> wrote: > > Hi Folks, > > > > Is it possible to get the all terms that have same stem from Marklogic > database? > > I want to get all terms that belongs to the same stem. > > > > Thanks & Regards > > Abhishek Srivastav > > Systems Engineer > > Tata Consultancy Services > > Cell:- +91-9883389968 > > Mailto: [email protected]<mailto:[email protected]> > > Website: http://www.tcs.com<http://www.tcs.com/> > > ____________________________________________ > > Experience certainty. IT Services > > Business Solutions > > Outsourcing > > > > =====-----=====-----===== > > Notice: The information contained in this e-mail > > message and/or attachments to it may contain > > confidential or privileged information. If you are > > not the intended recipient, any dissemination, use, > > review, distribution, printing or copying of the > > information contained in this e-mail message > > and/or attachments to it are strictly prohibited. If > > you have received this communication in error, > > please notify us by reply e-mail or telephone and > > immediately and permanently delete the message > > and any attachments. Thank you > > _______________________________________________ > > General mailing list > > > [email protected]<mailto:[email protected] > > > > > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
