Hi Geert

Distribution is pretty much what the xquery generate for element "a" in my
previous reply. Another example for this distribution can be Youtube's most
popular video. If we say that each video play generate event and than that
is stored in the database. Some very popular videos will have lots of
events. As there are lot videos so index will be huge as well.

I did some further investigation by changing the forest assignment policy
to range and I can see that performance is better. But this means we will
need to choose one key for range.

The purpose of this exercise is to compare search performance in MarkLogic
and Elasticsearch for monitoring events sored in database from various
applications.


Thanks & regards,
Ravinder Singh Maan

On Tue, Feb 23, 2016 at 8:43 PM, <[email protected]>
wrote:

> Send General mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://developer.marklogic.com/mailman/listinfo/general
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of General digest..."
>
>
> Today's Topics:
>
>    1. Re: Best way to find most occuring word or sort by frequency
>       (Geert Josten)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 23 Feb 2016 20:43:45 +0000
> From: Geert Josten <[email protected]>
> Subject: Re: [MarkLogic Dev General] Best way to find most occuring
>         word or sort by frequency
> To: MarkLogic Developer Discussion <[email protected]>
> Message-ID: <d2f28280.c63f5%[email protected]>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Ravinder,
>
> Talking with people internally about this, hoping to get back soon with
> ideas..
>
> Meantime, can you tell if the distribution is pretty evenly with just a
> few spikes at the top end, or is there a smooth decay in values. If top-10
> contains a lot of equal frequency counts, MarkLogic will pick the first in
> item-order, which actually means extra work..
>
> Cheers,
> Geert
>
> From: <[email protected]<mailto:
> [email protected]>> on behalf of RAVINDER MAAN <
> [email protected]<mailto:[email protected]>>
> Reply-To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]>>
> Date: Monday, February 22, 2016 at 2:17 PM
> To: "[email protected]<mailto:
> [email protected]>" <[email protected]<mailto:
> [email protected]>>
> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 56
>
> Hi Geert
>
> Thanks for the reply. I think I have found the reason for it and what I
> found is very interesting. To remove all the factors which can effect the
> performance I just created a 1 forest database on separate machine and
> inserted 9 million documents into this database using below code.
>
> xquery version "1.0-ml";
>
> for $i in (1 to 9000000)
> return
> xdmp:eval('
> xdmp:document-insert("/event/'||$i||'",
> <event><a>AB.CDEF/2001XX000729-{xdmp:random(2000000)}</a><b>AB.CDEF/2001XX000729-{xdmp:random(40)}</b></event>)
> ')
>
> I created 2 range indexes, one on element "a" and another on element "b".
> The above query is generating documents with element a which has huge range
> of values(2 million possible values). It is also generating element "b" but
> range of values is only (40). Now if I run element-values query against
> element "a" ordered by frequency it is very slow in comparison to same
> query run on element "b". For element "a" even if I run query again and
> again I am seeing response time of 3 seconds. Whereas for element b
> response time is 220 milliseconds. Based on this I looked into how indexes
> in Elasticsearch work and it is interesting that in Elasticsearch indexes
> are sharded based on range.
>
> I think the next test for this will be to try range based assignment
> policy so that each forest contains small subset of range index.
>
>
>
>
>
> Thanks & regards,
> Ravinder Singh Maan
>
> On Sun, Feb 21, 2016 at 3:39 PM, <[email protected]
> <mailto:[email protected]>> wrote:
> Send General mailing list submissions to
>         [email protected]<mailto:
> [email protected]>
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://developer.marklogic.com/mailman/listinfo/general
> or, via email, send a message with subject or body 'help' to
>         [email protected]<mailto:
> [email protected]>
>
> You can reach the person managing the list at
>         [email protected]<mailto:
> [email protected]>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of General digest..."
>
>
> Today's Topics:
>
>    1. Re: General Digest, Vol 140, Issue 54 (Geert Josten)
>    2. Re: General Digest, Vol 140, Issue 54 (Rob Szkutak)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 21 Feb 2016 11:42:24 +0000
> From: Geert Josten <[email protected]<mailto:
> [email protected]>>
> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54
> To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]>>
> Message-ID: <d2ef5eaa.c456f%[email protected]<mailto:
> d2ef5eaa.c456f%[email protected]>>
> Content-Type: text/plain; charset="windows-1252"
>
> Hi Ravinder,
>
> Thanks for the info. So you have 12 physical cores in total, and an equal
> number of forests. That should mean you have roughly 12 mln docs per
> forest. That should be a nice number for fast faceting, and getting value
> frequencies.
>
> I am rather surprised about the 30 seconds though, and especially because
> the above sounds right. I ran a little comparison on an average demo server
> over here, with a single forest containing 16 mln docs. I restarted the
> server to make sure the caches are cold, and then ran the same code as you,
> only for a slightly different element index. It returned in 0.06 sec, which
> is kind of the order of magnitude I?d typically expect from MarkLogic.
> Using a cluster shouldn?t add much more, regardless of the number of nodes
> or forests. Are the number consistent if you rerun your test?
>
> You should always be able to get sub-sec results for this. And because
> that is clearly not happening, something else must be causing issues here.
> Low latency for instance, or maybe your indexes are taking more memory that
> MarkLogic is getting, meaning it could be swapping or such. How much free
> memory is available on the three nodes, and how fast is the network
> connection between them? Also, is anything else competing for cpu, memory,
> or network bandwidth perhaps?
>
> Cheers,
> Geert
>
> From: <[email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>> on behalf of RAVINDER MAAN <
> [email protected]<mailto:[email protected]><mailto:[email protected]
> <mailto:[email protected]>>>
> Reply-To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>>>
> Date: Saturday, February 20, 2016 at 11:08 PM
> To: "[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>" <
> [email protected]<mailto:[email protected]
> ><mailto:[email protected]<mailto:
> [email protected]>>>
> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54
>
> Hi Geerat
>
> Thanks for reply. In ML it takes about 30 seconds and in elasticsearch it
> takes 4 seconds. It is cluster of 3 nodes. Each node has 16GB RAM and "ls
> /proc/cpuinfo" show 8 cores(I think it is because of hyper threading actual
> cores are 4). I have configured 4 forests per node. Do you think
> increasing/decreasing number of forests will help? As this is range index
> query so I guess entire index is in memory so other cache settings should
> not effect this query.
>
> If I run the query with query meters I just see below cache misses, all
> other caches hit/miss are 0.
>
> <qm:value-cache-misses>194</qm:value-cache-misses>
> <qm:regexp-cache-hits>181</qm:regexp-cache-hits>
> <qm:regexp-cache-misses>5</qm:regexp-cache-misses>
>
>
> Thanks & regards,
> Ravinder Singh Maan
>
> On Sat, Feb 20, 2016 at 7:33 PM, <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>> wrote:
> Send General mailing list submissions to
>         [email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://developer.marklogic.com/mailman/listinfo/general
> or, via email, send a message with subject or body 'help' to
>         [email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>
>
> You can reach the person managing the list at
>         [email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of General digest..."
>
>
> Today's Topics:
>
>    1. Re: Best way to find most occuring word or sort by frequency
>       (Geert Josten)
>    2. Re: [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple
>       index not enabled (Geert Josten)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 20 Feb 2016 18:44:50 +0000
> From: Geert Josten <[email protected]<mailto:
> [email protected]><mailto:[email protected]<mailto:
> [email protected]>>>
> Subject: Re: [MarkLogic Dev General] Best way to find most occuring
>         word or sort by frequency
> To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>>>
> Message-ID: <d2ee7209.c3d73%[email protected]<mailto:
> d2ee7209.c3d73%[email protected]><mailto:
> d2ee7209.c3d73%[email protected]<mailto:
> d2ee7209.c3d73%[email protected]>>>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> I think this is the right approach..
>
> If you talk about it being slow, how slow is that exactly? And how did you
> configure MarkLogic? More specifically, how many forest do you have? Also,
> how much memory, and cpu cores do you have?
>
> Kind regards,
> Geert
>
>
> From: <[email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>><mailto:
> [email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>>> on behalf of RAVINDER MAAN <
> [email protected]<mailto:[email protected]><mailto:[email protected]
> <mailto:[email protected]>><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]<mailto:[email protected]
> >>>>
> Reply-To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]
> >><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Date: Saturday, February 20, 2016 at 11:34 AM
> To: "[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>><mailto:
> [email protected]<mailto:[email protected]
> ><mailto:[email protected]<mailto:
> [email protected]>>>" <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]
> >><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Subject: [MarkLogic Dev General] Best way to find most occuring word or
> sort by frequency
>
> Hello all
>
> I want to sort element values by frequency. I have tried below
>
> for $word in cts:element-values(xs:QName("ELEMENT_NAME"),  (),
> ("frequency-order", "limit=10"))
> return <word count="{cts:frequency($word)}">{$word}</word>
>
>
> But for very large index this is slow in comparison to elasticsearch. I
> did this comparison on same machine with same data and of course only one
> of them was running when I did the comparison. There are about 250 million
> documents and frequency range is 1 million to hundreds i.e. if I run above
> query the word on the top has count 1000000.
>
> Is there any other way of doing same ?
>
>
> Thanks & regards,
> Ravinder Singh Maan
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://developer.marklogic.com/pipermail/general/attachments/20160220/eecf895c/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Sat, 20 Feb 2016 19:33:41 +0000
> From: Geert Josten <[email protected]<mailto:
> [email protected]><mailto:[email protected]<mailto:
> [email protected]>>>
> Subject: Re: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND:
>         cts:triples() -- Triple index not enabled
> To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>>>
> Message-ID: <d2ee7d69.c3ddc%[email protected]<mailto:
> d2ee7d69.c3ddc%[email protected]><mailto:
> d2ee7d69.c3ddc%[email protected]<mailto:
> d2ee7d69.c3ddc%[email protected]>>>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Ga?l,
>
> You need to enable the triple-index. You can do that by going to the Admin
> UI of your MarkLogic installation, navigating to the relevant content
> database, and toggling the triple index from false to true there. It should
> be around the 10th edit option, so close to the top. Confirm the change by
> clicking OK at the top or bottom of the page, and then wait for the reindex
> to complete. You can follow the progress on the Status tab of that
> database. Refresh it once in a while to get it updated.
>
> Kind regards,
> Geert
>
> From: <[email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>><mailto:
> [email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>>> on behalf of Ga?l YIMEN YIMGA
> <[email protected]<mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>><mailto:
> [email protected]<mailto:[email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Reply-To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]
> >><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Date: Saturday, February 20, 2016 at 5:46 PM
> To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]
> >><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Subject: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND:
> cts:triples() -- Triple index not enabled
>
> Hello All,
>
> I'm facing an issue in MarkLogic.
> I ran successfully the following query
> ===================
> import module namespace sem = "http://marklogic.com/semantics";
>       at "/MarkLogic/semantics.xqy";
>
> sem:rdf-insert(
>   (
>   sem:triple(
>     sem:iri("http://example.org/marklogic/people/John_Smith";),
>     sem:iri("http://example.org/marklogic/predicate/livesIn";),
>     "London"
>     )
>   ,
>   sem:triple(
>     sem:iri("http://example.org/marklogic/people/Jane_Smith";),
>     sem:iri("http://example.org/marklogic/predicate/livesIn";),
>     "London"
>     )
>   ,
>   sem:triple(
>     sem:iri("http://example.org/marklogic/people/Jack_Smith";),
>     sem:iri("http://example.org/marklogic/predicate/livesIn";),
>     "Glasgow"
>     )
>   )
> )
> ===================
>
> But in a secnond plan, I rand the following to count the number of triples
> =======
> xquery version "1.0-ml";
> declare namespace html = "http://www.w3.org/1999/xhtml";;
> fn:count(cts:triples());
> =======
> I got the following error in the image below
>
> [Images int?gr?es 1]
>
> Your help to fix this will be greatfull.
>
> Thanks in advance !!!
>
> Ga?l.
> --
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.html
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: image.png
> Type: image/png
> Size: 17507 bytes
> Desc: image.png
> Url :
> http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.png
>
> ------------------------------
>
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]
> ><mailto:[email protected]<mailto:
> [email protected]>>
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> End of General Digest, Vol 140, Issue 54
> ****************************************
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://developer.marklogic.com/pipermail/general/attachments/20160221/1e2a94a8/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Sun, 21 Feb 2016 15:39:40 +0000
> From: Rob Szkutak <[email protected]<mailto:
> [email protected]>>
> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54
> To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]>>
> Message-ID:
>         <
> 6e8e665d710d394a853b6eec145fb7dc16570...@exchg10-be01.marklogic.com
> <mailto:
> 6e8e665d710d394a853b6eec145fb7dc16570...@exchg10-be01.marklogic.com>>
> Content-Type: text/plain; charset="windows-1252"
>
> Hi Ravinder,
>
> In addition to Geert's excellent suggestions, you should also take a look
> to see if you've configured your swap space correctly:
> https://docs.marklogic.com/guide/installation/intro#id_11335
>
> Best,
> Rob
>
> Rob Szkutak
> Senior Consultant
> MarkLogic Corporation
> [email protected]<mailto:[email protected]>
> www.marklogic.com<http://www.marklogic.com><http://www.marklogic.com>
>
> ________________________________
> From: [email protected]<mailto:
> [email protected]> [
> [email protected]<mailto:
> [email protected]>] on behalf of Geert Josten [
> [email protected]<mailto:[email protected]>]
> Sent: Sunday, February 21, 2016 5:42 AM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54
>
> Hi Ravinder,
>
> Thanks for the info. So you have 12 physical cores in total, and an equal
> number of forests. That should mean you have roughly 12 mln docs per
> forest. That should be a nice number for fast faceting, and getting value
> frequencies.
>
> I am rather surprised about the 30 seconds though, and especially because
> the above sounds right. I ran a little comparison on an average demo server
> over here, with a single forest containing 16 mln docs. I restarted the
> server to make sure the caches are cold, and then ran the same code as you,
> only for a slightly different element index. It returned in 0.06 sec, which
> is kind of the order of magnitude I?d typically expect from MarkLogic.
> Using a cluster shouldn?t add much more, regardless of the number of nodes
> or forests. Are the number consistent if you rerun your test?
>
> You should always be able to get sub-sec results for this. And because
> that is clearly not happening, something else must be causing issues here.
> Low latency for instance, or maybe your indexes are taking more memory that
> MarkLogic is getting, meaning it could be swapping or such. How much free
> memory is available on the three nodes, and how fast is the network
> connection between them? Also, is anything else competing for cpu, memory,
> or network bandwidth perhaps?
>
> Cheers,
> Geert
>
> From: <[email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>> on behalf of RAVINDER MAAN <
> [email protected]<mailto:[email protected]><mailto:[email protected]
> <mailto:[email protected]>>>
> Reply-To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>>>
> Date: Saturday, February 20, 2016 at 11:08 PM
> To: "[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>" <
> [email protected]<mailto:[email protected]
> ><mailto:[email protected]<mailto:
> [email protected]>>>
> Subject: Re: [MarkLogic Dev General] General Digest, Vol 140, Issue 54
>
> Hi Geerat
>
> Thanks for reply. In ML it takes about 30 seconds and in elasticsearch it
> takes 4 seconds. It is cluster of 3 nodes. Each node has 16GB RAM and "ls
> /proc/cpuinfo" show 8 cores(I think it is because of hyper threading actual
> cores are 4). I have configured 4 forests per node. Do you think
> increasing/decreasing number of forests will help? As this is range index
> query so I guess entire index is in memory so other cache settings should
> not effect this query.
>
> If I run the query with query meters I just see below cache misses, all
> other caches hit/miss are 0.
>
> <qm:value-cache-misses>194</qm:value-cache-misses>
> <qm:regexp-cache-hits>181</qm:regexp-cache-hits>
> <qm:regexp-cache-misses>5</qm:regexp-cache-misses>
>
>
> Thanks & regards,
> Ravinder Singh Maan
>
> On Sat, Feb 20, 2016 at 7:33 PM, <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>> wrote:
> Send General mailing list submissions to
>         [email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://developer.marklogic.com/mailman/listinfo/general
> or, via email, send a message with subject or body 'help' to
>         [email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>
>
> You can reach the person managing the list at
>         [email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of General digest..."
>
>
> Today's Topics:
>
>    1. Re: Best way to find most occuring word or sort by frequency
>       (Geert Josten)
>    2. Re: [1.0-ml] XDMP-TRPLIDXNOTFOUND: cts:triples() -- Triple
>       index not enabled (Geert Josten)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 20 Feb 2016 18:44:50 +0000
> From: Geert Josten <[email protected]<mailto:
> [email protected]><mailto:[email protected]<mailto:
> [email protected]>>>
> Subject: Re: [MarkLogic Dev General] Best way to find most occuring
>         word or sort by frequency
> To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>>>
> Message-ID: <d2ee7209.c3d73%[email protected]<mailto:
> d2ee7209.c3d73%[email protected]><mailto:
> d2ee7209.c3d73%[email protected]<mailto:
> d2ee7209.c3d73%[email protected]>>>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> I think this is the right approach..
>
> If you talk about it being slow, how slow is that exactly? And how did you
> configure MarkLogic? More specifically, how many forest do you have? Also,
> how much memory, and cpu cores do you have?
>
> Kind regards,
> Geert
>
>
> From: <[email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>><mailto:
> [email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>>> on behalf of RAVINDER MAAN <
> [email protected]<mailto:[email protected]><mailto:[email protected]
> <mailto:[email protected]>><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]<mailto:[email protected]
> >>>>
> Reply-To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]
> >><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Date: Saturday, February 20, 2016 at 11:34 AM
> To: "[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>><mailto:
> [email protected]<mailto:[email protected]
> ><mailto:[email protected]<mailto:
> [email protected]>>>" <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]
> >><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Subject: [MarkLogic Dev General] Best way to find most occuring word or
> sort by frequency
>
> Hello all
>
> I want to sort element values by frequency. I have tried below
>
> for $word in cts:element-values(xs:QName("ELEMENT_NAME"),  (),
> ("frequency-order", "limit=10"))
> return <word count="{cts:frequency($word)}">{$word}</word>
>
>
> But for very large index this is slow in comparison to elasticsearch. I
> did this comparison on same machine with same data and of course only one
> of them was running when I did the comparison. There are about 250 million
> documents and frequency range is 1 million to hundreds i.e. if I run above
> query the word on the top has count 1000000.
>
> Is there any other way of doing same ?
>
>
> Thanks & regards,
> Ravinder Singh Maan
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://developer.marklogic.com/pipermail/general/attachments/20160220/eecf895c/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Sat, 20 Feb 2016 19:33:41 +0000
> From: Geert Josten <[email protected]<mailto:
> [email protected]><mailto:[email protected]<mailto:
> [email protected]>>>
> Subject: Re: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND:
>         cts:triples() -- Triple index not enabled
> To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>>>
> Message-ID: <d2ee7d69.c3ddc%[email protected]<mailto:
> d2ee7d69.c3ddc%[email protected]><mailto:
> d2ee7d69.c3ddc%[email protected]<mailto:
> d2ee7d69.c3ddc%[email protected]>>>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Ga?l,
>
> You need to enable the triple-index. You can do that by going to the Admin
> UI of your MarkLogic installation, navigating to the relevant content
> database, and toggling the triple index from false to true there. It should
> be around the 10th edit option, so close to the top. Confirm the change by
> clicking OK at the top or bottom of the page, and then wait for the reindex
> to complete. You can follow the progress on the Status tab of that
> database. Refresh it once in a while to get it updated.
>
> Kind regards,
> Geert
>
> From: <[email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>><mailto:
> [email protected]<mailto:
> [email protected]><mailto:
> [email protected]<mailto:
> [email protected]>>>> on behalf of Ga?l YIMEN YIMGA
> <[email protected]<mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>><mailto:
> [email protected]<mailto:[email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Reply-To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]
> >><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Date: Saturday, February 20, 2016 at 5:46 PM
> To: MarkLogic Developer Discussion <[email protected]
> <mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]
> >><mailto:[email protected]<mailto:
> [email protected]><mailto:[email protected]
> <mailto:[email protected]>>>>
> Subject: [MarkLogic Dev General] [1.0-ml] XDMP-TRPLIDXNOTFOUND:
> cts:triples() -- Triple index not enabled
>
> Hello All,
>
> I'm facing an issue in MarkLogic.
> I ran successfully the following query
> ===================
> import module namespace sem = "http://marklogic.com/semantics";
>       at "/MarkLogic/semantics.xqy";
>
> sem:rdf-insert(
>   (
>   sem:triple(
>     sem:iri("http://example.org/marklogic/people/John_Smith";),
>     sem:iri("http://example.org/marklogic/predicate/livesIn";),
>     "London"
>     )
>   ,
>   sem:triple(
>     sem:iri("http://example.org/marklogic/people/Jane_Smith";),
>     sem:iri("http://example.org/marklogic/predicate/livesIn";),
>     "London"
>     )
>   ,
>   sem:triple(
>     sem:iri("http://example.org/marklogic/people/Jack_Smith";),
>     sem:iri("http://example.org/marklogic/predicate/livesIn";),
>     "Glasgow"
>     )
>   )
> )
> ===================
>
> But in a secnond plan, I rand the following to count the number of triples
> =======
> xquery version "1.0-ml";
> declare namespace html = "http://www.w3.org/1999/xhtml";;
> fn:count(cts:triples());
> =======
> I got the following error in the image below
>
> [Images int?gr?es 1]
>
> Your help to fix this will be greatfull.
>
> Thanks in advance !!!
>
> Ga?l.
> --
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.html
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: image.png
> Type: image/png
> Size: 17507 bytes
> Desc: image.png
> Url :
> http://developer.marklogic.com/pipermail/general/attachments/20160220/a4c6b935/attachment.png
>
> ------------------------------
>
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]
> ><mailto:[email protected]<mailto:
> [email protected]>>
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> End of General Digest, Vol 140, Issue 54
> ****************************************
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://developer.marklogic.com/pipermail/general/attachments/20160221/0ffca307/attachment.html
>
> ------------------------------
>
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> End of General Digest, Vol 140, Issue 56
> ****************************************
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://developer.marklogic.com/pipermail/general/attachments/20160223/7733f0e6/attachment.html
>
> ------------------------------
>
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> End of General Digest, Vol 140, Issue 64
> ****************************************
>
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to