Hi Ravinder, Ok, steep curve at the high end, and a long tail with low frequencies. That is not the same as the distribution of element a though, as the random will give an relatively even distribution. E.g. if you put all values and frequencies in a graph, the overall line will be horizontal, not a descending curve..
In that sense, make sure you test with realistic data. Real data if possible. One other thing you can test is adding more forests. MarkLogic parallelizes the calculation across all stands in all forests. More forests should mean more parallel workers. Had in mind to run a test with this myself, but got interrupted. Cheers, Geert From: <[email protected]<mailto:[email protected]>> on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Wednesday, February 24, 2016 at 1:13 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 64 Hi Geert Distribution is pretty much what the xquery generate for element "a" in my previous reply. Another example for this distribution can be Youtube's most popular video. If we say that each video play generate event and than that is stored in the database. Some very popular videos will have lots of events. As there are lot videos so index will be huge as well. I did some further investigation by changing the forest assignment policy to range and I can see that performance is better. But this means we will need to choose one key for range. The purpose of this exercise is to compare search performance in MarkLogic and Elasticsearch for monitoring events sored in database from various applications. Thanks & regards, Ravinder Singh Maan On Tue, Feb 23, 2016 at 8:43 PM, <[email protected]<mailto:[email protected]>> wrote: Send General mailing list submissions to [email protected]<mailto:[email protected]> To subscribe or unsubscribe via the World Wide Web, visit http://developer.marklogic.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to [email protected]<mailto:[email protected]> You can reach the person managing the list at [email protected]<mailto:[email protected]> When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. Re: Best way to find most occuring word or sort by frequency (Geert Josten) ---------------------------------------------------------------------- Message: 1 Date: Tue, 23 Feb 2016 20:43:45 +0000 From: Geert Josten <[email protected]<mailto:[email protected]>> Subject: Re: [MarkLogic Dev General] Best way to find most occuring word or sort by frequency To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Message-ID: <d2f28280.c63f5%[email protected]<mailto:d2f28280.c63f5%[email protected]>> Content-Type: text/plain; charset="us-ascii" Hi Ravinder, Talking with people internally about this, hoping to get back soon with ideas.. Meantime, can you tell if the distribution is pretty evenly with just a few spikes at the top end, or is there a smooth decay in values. If top-10 contains a lot of equal frequency counts, MarkLogic will pick the first in item-order, which actually means extra work.. Cheers, Geert From: <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Date: Monday, February 22, 2016 at 2:17 PM To: "[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>" <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 56 Hi Geert Thanks for the reply. I think I have found the reason for it and what I found is very interesting. To remove all the factors which can effect the performance I just created a 1 forest database on separate machine and inserted 9 million documents into this database using below code. xquery version "1.0-ml"; for $i in (1 to 9000000) return xdmp:eval(' xdmp:document-insert("/event/'||$i||'", <event><a>AB.CDEF/2001XX000729-{xdmp:random(2000000)}</a><b>AB.CDEF/2001XX000729-{xdmp:random(40)}</b></event>) ') I created 2 range indexes, one on element "a" and another on element "b". The above query is generating documents with element a which has huge range of values(2 million possible values). It is also generating element "b" but range of values is only (40). Now if I run element-values query against element "a" ordered by frequency it is very slow in comparison to same query run on element "b". For element "a" even if I run query again and again I am seeing response time of 3 seconds. Whereas for element b response time is 220 milliseconds. Based on this I looked into how indexes in Elasticsearch work and it is interesting that in Elasticsearch indexes are sharded based on range. I think the next test for this will be to try range based assignment policy so that each forest contains small subset of range index. Thanks & regards, Ravinder Singh Maan On Sun, Feb 21, 2016 at 3:39 PM, <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> wrote: Send General mailing list submissions to [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>> To subscribe or unsubscribe via the World Wide Web, visit http://developer.marklogic.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>> You can reach the person managing the list at [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>> When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. Re: General Digest, Vol 140, Issue 54 (Geert Josten) 2. Re: General Digest, Vol 140, Issue 54 (Rob Szkutak) ---------------------------------------------------------------------- Message: 1 Date: Sun, 21 Feb 2016 11:42:24 +0000 From: Geert Josten <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54 To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Message-ID: <d2ef5eaa.c456f%[email protected]<mailto:d2ef5eaa.c456f%[email protected]><mailto:d2ef5eaa.c456f%[email protected]<mailto:d2ef5eaa.c456f%[email protected]>>> Content-Type: text/plain; charset="windows-1252" Hi Ravinder, Thanks for the info. So you have 12 physical cores in total, and an equal number of forests. That should mean you have roughly 12 mln docs per forest. That should be a nice number for fast faceting, and getting value frequencies. I am rather surprised about the 30 seconds though, and especially because the above sounds right. I ran a little comparison on an average demo server over here, with a single forest containing 16 mln docs. I restarted the server to make sure the caches are cold, and then ran the same code as you, only for a slightly different element index. It returned in 0.06 sec, which is kind of the order of magnitude I?d typically expect from MarkLogic. Using a cluster shouldn?t add much more, regardless of the number of nodes or forests. Are the number consistent if you rerun your test? You should always be able to get sub-sec results for this. And because that is clearly not happening, something else must be causing issues here. Low latency for instance, or maybe your indexes are taking more memory that MarkLogic is getting, meaning it could be swapping or such. How much free memory is available on the three nodes, and how fast is the network connection between them? Also, is anything else competing for cpu, memory, or network bandwidth perhaps? Cheers, Geert From: <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Date: Saturday, February 20, 2016 at 11:08 PM To: "[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>" <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54 Hi Geerat Thanks for reply. In ML it takes about 30 seconds and in elasticsearch it takes 4 seconds. It is cluster of 3 nodes. Each node has 16GB RAM and "ls /proc/cpuinfo" show 8 cores(I think it is because of hyper threading actual cores are 4). I have configured 4 forests per node. Do you think increasing/decreasing number of forests will help? As this is range index query so I guess entire index is in memory so other cache settings should not effect this query. If I run the query with query meters I just see below cache misses, all other caches hit/miss are 0. <qm:value-cache-misses>194</qm:value-cache-misses> <qm:regexp-cache-hits>181</qm:regexp-cache-hits> <qm:regexp-cache-misses>5</qm:regexp-cache-misses> Thanks & regards, Ravinder Singh Maan On Sat, Feb 20, 2016 at 7:33 PM, <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> wrote: Send General mailing list submissions to [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> To subscribe or unsubscribe via the World Wide Web, visit http://developer.marklogic.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> You can reach the person managing the list at [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. Re: Best way to find most occuring word or sort by frequency (Geert Josten) 2. Re: [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple index not enabled (Geert Josten) ---------------------------------------------------------------------- Message: 1 Date: Sat, 20 Feb 2016 18:44:50 +0000 From: Geert Josten <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Subject: Re: [MarkLogic Dev General] Best way to find most occuring word or sort by frequency To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Message-ID: <d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]><mailto:d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]>><mailto:d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]><mailto:d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]>>>> Content-Type: text/plain; charset="us-ascii" Hi, I think this is the right approach.. If you talk about it being slow, how slow is that exactly? And how did you configure MarkLogic? More specifically, how many forest do you have? Also, how much memory, and cpu cores do you have? Kind regards, Geert From: <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Date: Saturday, February 20, 2016 at 11:34 AM To: "[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>" <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Subject: [MarkLogic Dev General] Best way to find most occuring word or sort by frequency Hello all I want to sort element values by frequency. I have tried below for $word in cts:element-values(xs:QName("ELEMENT_NAME"), (), ("frequency-order", "limit=10")) return <word count="{cts:frequency($word)}">{$word}</word> But for very large index this is slow in comparison to elasticsearch. I did this comparison on same machine with same data and of course only one of them was running when I did the comparison. There are about 250 million documents and frequency range is 1 million to hundreds i.e. if I run above query the word on the top has count 1000000. Is there any other way of doing same ? Thanks & regards, Ravinder Singh Maan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20160220/eecf895c/attachment-0001.html ------------------------------ Message: 2 Date: Sat, 20 Feb 2016 19:33:41 +0000 From: Geert Josten <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Subject: Re: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple index not enabled To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Message-ID: <d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]><mailto:d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]>><mailto:d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]><mailto:d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]>>>> Content-Type: text/plain; charset="iso-8859-1" Hi Ga?l, You need to enable the triple-index. You can do that by going to the Admin UI of your MarkLogic installation, navigating to the relevant content database, and toggling the triple index from false to true there. It should be around the 10th edit option, so close to the top. Confirm the change by clicking OK at the top or bottom of the page, and then wait for the reindex to complete. You can follow the progress on the Status tab of that database. Refresh it once in a while to get it updated. Kind regards, Geert From: <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> on behalf of Ga?l YIMEN YIMGA <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Date: Saturday, February 20, 2016 at 5:46 PM To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Subject: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple index not enabled Hello All, I'm facing an issue in MarkLogic. I ran successfully the following query =================== import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy"; sem:rdf-insert( ( sem:triple( sem:iri("http://example.org/marklogic/people/John_Smith"), sem:iri("http://example.org/marklogic/predicate/livesIn"), "London" ) , sem:triple( sem:iri("http://example.org/marklogic/people/Jane_Smith"), sem:iri("http://example.org/marklogic/predicate/livesIn"), "London" ) , sem:triple( sem:iri("http://example.org/marklogic/people/Jack_Smith"), sem:iri("http://example.org/marklogic/predicate/livesIn"), "Glasgow" ) ) ) =================== But in a secnond plan, I rand the following to count the number of triples ======= xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml"; fn:count(cts:triples()); ======= I got the following error in the image below [Images int?gr?es 1] Your help to fix this will be greatfull. Thanks in advance !!! Ga?l. -- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 17507 bytes Desc: image.png Url : http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.png ------------------------------ _______________________________________________ General mailing list [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general End of General Digest, Vol 140, Issue 54 **************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20160221/1e2a94a8/attachment-0001.html ------------------------------ Message: 2 Date: Sun, 21 Feb 2016 15:39:40 +0000 From: Rob Szkutak <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54 To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Message-ID: <6e8e665d710d394a853b6eec145fb7dc16570...@exchg10-be01.marklogic.com<mailto:6e8e665d710d394a853b6eec145fb7dc16570...@exchg10-be01.marklogic.com><mailto:6e8e665d710d394a853b6eec145fb7dc16570...@exchg10-be01.marklogic.com<mailto:6e8e665d710d394a853b6eec145fb7dc16570...@exchg10-be01.marklogic.com>>> Content-Type: text/plain; charset="windows-1252" Hi Ravinder, In addition to Geert's excellent suggestions, you should also take a look to see if you've configured your swap space correctly: https://docs.marklogic.com/guide/installation/intro#id_11335 Best, Rob Rob Szkutak Senior Consultant MarkLogic Corporation [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>> www.marklogic.com<http://www.marklogic.com><http://www.marklogic.com><http://www.marklogic.com> ________________________________ From: [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>> [[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>] on behalf of Geert Josten [[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>] Sent: Sunday, February 21, 2016 5:42 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54 Hi Ravinder, Thanks for the info. So you have 12 physical cores in total, and an equal number of forests. That should mean you have roughly 12 mln docs per forest. That should be a nice number for fast faceting, and getting value frequencies. I am rather surprised about the 30 seconds though, and especially because the above sounds right. I ran a little comparison on an average demo server over here, with a single forest containing 16 mln docs. I restarted the server to make sure the caches are cold, and then ran the same code as you, only for a slightly different element index. It returned in 0.06 sec, which is kind of the order of magnitude I?d typically expect from MarkLogic. Using a cluster shouldn?t add much more, regardless of the number of nodes or forests. Are the number consistent if you rerun your test? You should always be able to get sub-sec results for this. And because that is clearly not happening, something else must be causing issues here. Low latency for instance, or maybe your indexes are taking more memory that MarkLogic is getting, meaning it could be swapping or such. How much free memory is available on the three nodes, and how fast is the network connection between them? Also, is anything else competing for cpu, memory, or network bandwidth perhaps? Cheers, Geert From: <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Date: Saturday, February 20, 2016 at 11:08 PM To: "[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>" <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54 Hi Geerat Thanks for reply. In ML it takes about 30 seconds and in elasticsearch it takes 4 seconds. It is cluster of 3 nodes. Each node has 16GB RAM and "ls /proc/cpuinfo" show 8 cores(I think it is because of hyper threading actual cores are 4). I have configured 4 forests per node. Do you think increasing/decreasing number of forests will help? As this is range index query so I guess entire index is in memory so other cache settings should not effect this query. If I run the query with query meters I just see below cache misses, all other caches hit/miss are 0. <qm:value-cache-misses>194</qm:value-cache-misses> <qm:regexp-cache-hits>181</qm:regexp-cache-hits> <qm:regexp-cache-misses>5</qm:regexp-cache-misses> Thanks & regards, Ravinder Singh Maan On Sat, Feb 20, 2016 at 7:33 PM, <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> wrote: Send General mailing list submissions to [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> To subscribe or unsubscribe via the World Wide Web, visit http://developer.marklogic.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> You can reach the person managing the list at [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. Re: Best way to find most occuring word or sort by frequency (Geert Josten) 2. Re: [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple index not enabled (Geert Josten) ---------------------------------------------------------------------- Message: 1 Date: Sat, 20 Feb 2016 18:44:50 +0000 From: Geert Josten <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Subject: Re: [MarkLogic Dev General] Best way to find most occuring word or sort by frequency To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Message-ID: <d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]><mailto:d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]>><mailto:d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]><mailto:d2ee7209.c3d73%[email protected]<mailto:d2ee7209.c3d73%[email protected]>>>> Content-Type: text/plain; charset="us-ascii" Hi, I think this is the right approach.. If you talk about it being slow, how slow is that exactly? And how did you configure MarkLogic? More specifically, how many forest do you have? Also, how much memory, and cpu cores do you have? Kind regards, Geert From: <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> on behalf of RAVINDER MAAN <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Date: Saturday, February 20, 2016 at 11:34 AM To: "[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>" <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Subject: [MarkLogic Dev General] Best way to find most occuring word or sort by frequency Hello all I want to sort element values by frequency. I have tried below for $word in cts:element-values(xs:QName("ELEMENT_NAME"), (), ("frequency-order", "limit=10")) return <word count="{cts:frequency($word)}">{$word}</word> But for very large index this is slow in comparison to elasticsearch. I did this comparison on same machine with same data and of course only one of them was running when I did the comparison. There are about 250 million documents and frequency range is 1 million to hundreds i.e. if I run above query the word on the top has count 1000000. Is there any other way of doing same ? Thanks & regards, Ravinder Singh Maan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20160220/eecf895c/attachment-0001.html ------------------------------ Message: 2 Date: Sat, 20 Feb 2016 19:33:41 +0000 From: Geert Josten <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Subject: Re: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple index not enabled To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>> Message-ID: <d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]><mailto:d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]>><mailto:d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]><mailto:d2ee7d69.c3ddc%[email protected]<mailto:d2ee7d69.c3ddc%[email protected]>>>> Content-Type: text/plain; charset="iso-8859-1" Hi Ga?l, You need to enable the triple-index. You can do that by going to the Admin UI of your MarkLogic installation, navigating to the relevant content database, and toggling the triple index from false to true there. It should be around the 10th edit option, so close to the top. Confirm the change by clicking OK at the top or bottom of the page, and then wait for the reindex to complete. You can follow the progress on the Status tab of that database. Refresh it once in a while to get it updated. Kind regards, Geert From: <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> on behalf of Ga?l YIMEN YIMGA <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Date: Saturday, February 20, 2016 at 5:46 PM To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>>> Subject: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple index not enabled Hello All, I'm facing an issue in MarkLogic. I ran successfully the following query =================== import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy"; sem:rdf-insert( ( sem:triple( sem:iri("http://example.org/marklogic/people/John_Smith"), sem:iri("http://example.org/marklogic/predicate/livesIn"), "London" ) , sem:triple( sem:iri("http://example.org/marklogic/people/Jane_Smith"), sem:iri("http://example.org/marklogic/predicate/livesIn"), "London" ) , sem:triple( sem:iri("http://example.org/marklogic/people/Jack_Smith"), sem:iri("http://example.org/marklogic/predicate/livesIn"), "Glasgow" ) ) ) =================== But in a secnond plan, I rand the following to count the number of triples ======= xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml"; fn:count(cts:triples()); ======= I got the following error in the image below [Images int?gr?es 1] Your help to fix this will be greatfull. Thanks in advance !!! Ga?l. -- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 17507 bytes Desc: image.png Url : http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.png ------------------------------ _______________________________________________ General mailing list [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>><mailto:[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general End of General Digest, Vol 140, Issue 54 **************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20160221/0ffca307/attachment.html ------------------------------ _______________________________________________ General mailing list [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general End of General Digest, Vol 140, Issue 56 **************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20160223/7733f0e6/attachment.html ------------------------------ _______________________________________________ General mailing list [email protected]<mailto:[email protected]> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general End of General Digest, Vol 140, Issue 64 ****************************************
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
