Re: [MarkLogic Dev General] Issue with Mark Logic Query (Michael Blakeley)

Geert Josten Sat, 03 Dec 2011 03:20:17 -0800

I think there is more to it. Count forces actual data to be retrieved from
the database nodes, while xdmp:estimate uses memory-based indexes. So it
can save a lot of latency as well..


Kind regards,
Geert

-----Oorspronkelijk bericht-----
Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] Namens Paul M
Verzonden: vrijdag 2 december 2011 17:28
Aan: general@developer.marklogic.com
Onderwerp: Re: [MarkLogic Dev General] Issue with Mark Logic Query
(Michael Blakeley)

So if count is O(n), xdmp:estimate is a log n or some such ? Just curious.



----- Original Message -----
From: "general-requ...@developer.marklogic.com"
<general-requ...@developer.marklogic.com>
To: general@developer.marklogic.com
Cc:
Sent: Thursday, December 1, 2011 3:00 PM
Subject: General Digest, Vol 90, Issue 3

Send General mailing list submissions to
    general@developer.marklogic.com

To subscribe or unsubscribe via the World Wide Web, visit
    http://developer.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
    general-requ...@developer.marklogic.com

You can reach the person managing the list at
    general-ow...@developer.marklogic.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."


Today's Topics:

   1. Re: Issue with Mark Logic Query (Michael Blakeley)
   2. large (?) number of range indexes (Mike Sokolov)


----------------------------------------------------------------------

Message: 1
Date: Thu, 1 Dec 2011 09:05:50 -0800
From: Michael Blakeley <m...@blakeley.com>
Subject: Re: [MarkLogic Dev General] Issue with Mark Logic Query
To: General MarkLogic Developer Discussion
    <general@developer.marklogic.com>
Cc: rakesh.yadav12...@gmail.com
Message-ID: <efb6938d-b47f-46d3-9a5b-0c7a35cc9...@blakeley.com>
Content-Type: text/plain; charset=us-ascii

To query the value of an element, use an element-value-query term like
this:

  cts:element-value-query(xs:QName('meta:DateLoaded'), '2011*')

But since that uses a wildcard glob, it won't resolve from indexes unless
you also have appropriate wildcards enabled. If you have an element range
index on meta:DateLoaded with type=date, it would probably be better to
specify a range instead of a wildcard:

  cts:element-range-query(xs:QName('meta:DateLoaded'), '>=',
xs:date('2011-01-01')),
  cts:element-range-query(xs:QName('meta:DateLoaded'), '<',
xs:date('2012-01-01'))

Finally, it may be faster to evaluate the entire cts:query using
xdmp:estimate(cts:search($query)) rather than count(cts:uris($query)).
Using count() will be O(n) with the number of results. Note that both
count and estimate support an optional limit argument, which might be
useful for your '1 to 1000000' limit.

-- Mike

On 1 Dec 2011, at 01:46 , amit gope wrote:

> Hi All,
>
> I have a database where the element range index is on the element date,
and now i am executing a query where i have used element value query on
one of the elements, but the results fetched are not adhering to the
query, please suggest the changes that i need to make.
>
> let $uri :=(cts:uris('', ('document','limit=1000000'),
>             (cts:and-query((cts:directory-query('/content/',
'infinity'),
>         cts:element-query((xs:QName('meta:DateLoaded')),'2011*'),
>         cts:element-query((xs:QName('meta:PubName')),'Springer'),
>             cts:element-query(xs:QName('Affiliation'), cts:and-query((),
())),
>
cts:element-query(xs:QName('meta:Institution'),cts:and-query((),())),
>         cts:not-query(cts:element-query(xs:QName("meta:GeoOrg"),
cts:and-query((), ())))
>               ), ())), (), ()))[1 to 1000000]
> return (count($uri),$uri)
>
>
> In the above query it is fetching me uri's of those articles where the
meta dateloaded is 2010. Please suggest
>
> --
> Regards
> Amit
>
>
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general



------------------------------

Message: 2
Date: Thu, 01 Dec 2011 14:23:07 -0500
From: Mike Sokolov <soko...@ifactory.com>
Subject: [MarkLogic Dev General] large (?) number of range indexes
To: general@developer.marklogic.com
Message-ID: <4ed7d41b.9000...@ifactory.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I've found that cts:element-values() is *much* faster when you don't use
a query to filter.  For example,

cts:element-values (xs:QName("foo"), "a")

is 25x faster than

cts:element-values (xs:QName("foo"), "a",
cts:element-value-query(xs:QName("bar"), "baz"))

when every document indexed by foo in fact has bar=baz, ie when the
query is essentially a no-op.

Consequently, we're taking what used to be a bunch of large range
indexes and breaking them up into a lot of smaller range indexes, each
of which we can query independently (faster).

What I'm wondering is if anybody would care to speculate on whether
having a large number of small(er) indexes will pose some other
performance problem.  Presumably at least some of the keys will be
shared across these indexes, but the values (the fragment/document
references) should not, so overall storage should be only slightly larger?

--
Michael Sokolov
Engineering Director
www.ifactory.com



------------------------------

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 90, Issue 3
**************************************

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Issue with Mark Logic Query (Michael Blakeley)

Reply via email to