Mike,

Indexing is complete. The search element and sort element should be in the same fragment. An abbreviated form of my XML structure is:
<PATENT>
        <PATNUM>
        <ASSS>
                several layers down <ASSS_AESNC>
        <DESC>
        other stuff
All tags are unique at all levels of the XML hierarchy (e.g., PATNUM appears only at the top level and not within <DESC>).

I have set <DESC> as a fragment parent. My understanding is that <DESC> and below will be one fragment and everything else will be in a second fragment. In this case, ASSC_AENSC and PATNUM shoud be in the same fragment, correct? Do I also need to set <PATENT> as a fragment root?

In some cases, <DESC> does not exist and my guess is that it's for these cases in which the document has a single fragment that the ordering constraint help (only two documents in the example that I gave).

Also, I have the default namespace. However, all documents in the system have the same XML structure, so I didn't think that there would be a problem.

What am I missing here? Thanks.

                         -Dave


Message: 1
Date: Mon, 27 Apr 2009 08:52:49 -0700
From: Michael Blakeley <[email protected]>
Subject: Re: [MarkLogic Dev General] Questions about results ordering
	with	element range indices
To: General Mark Logic Developer Discussion
	<[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

David,

The answer to question (1) is that positions aren't needed, and (2) is 
"yes".

You've gathered the right information, and your deductions are correct. 
The '4357 unordered' trace appears in two situations that I know of: 
either the range index is incomplete, or the cts:search searchable 
_expression_ (arg1) is in a different fragment than the sort key. It looks 
like the latter applies here. If so, the search can't use the range 
index effectively because most of the fragments to be sorted do not 
contain the sort key directly.

To sort this result set efficiently, you will need to have the 
range-indexed values at the same fragment level as your search. 
Depending on your application and content, that might involve tweaking 
your query to avoid the fragment link, or reconsidering your 
fragmentation policy, or possibly copying or moving an element or two 
from one fragmentation level to another.

As you suggested, the bottleneck is probably I/O. From the query-meters 
output there could be up to 8385 read operations (some of the blocks 
might already be in disk cache or buffer cache). That implies up to 490 
logical reads/sec, which isn't bad for a RAID-5. The 10k disks can 
probably sustain 200-300 reads/sec each, so you might expect better. But 
in my experience RAID-5 random read performance usually isn't much 
better than single disk performance. I prefer RAID-10, or RAID-50 with 
small parity groups (for example, you could use a RAID-0 stripe across 
two 3+1 RAID-5 volumes), and a small stripe size (8-32 kB). The average 
document size on disk may play a role here too, if it's larger than the 
disk's physical read size.

-- Mike

On 2009-04-26 22:22, Dave Feldmeier wrote:
  
I am sorting results and the performance is less than I would like. A search without ordering takes 0.2 seconds and a search with ordering takes 17 seconds.

First some questions:

  1.  For improving sort performance, is it sufficient to set up an element range index, or must I also set the "range value positions" radio button to "True"?
  2.  Does an element range index improve performance if all the element values are unique?

For the following example, I am doing a word query and sorting based on the value of an element (each document has a unique value of this element and the type is "string"). I have set up an element range index for the element and set the "range value positions" radio button to "True".

Each document has two fragments and the fragment that I'm retrieving for each probably is something like 20K.

In the following trace (see below), the path is searchable (good), the word query contributes a constraint (good), the order by clause contributed a constraint (good), but then I get the line:
Selected 4359 fragments to filter (2 ordered, 4357 unordered)
If I understand this line, it appears that the element range index is not helping very much and that I'm fetching 4357 fragments (even so, does 17 seconds seem reasonable for this many fragments?).

The MarkLogic version is 4.0-3 running on Red Hat Enterprise Linux version 4. The hardware is eight 300G 10K RPM disks in a RAID5 array with dual quad core 64-bit Intel 5405 processors, and I'd think that this would be plenty fast.

                                               -Dave

David Feldmeier
Twin Dolphin Software, Inc.
303 Twin Dolphin Drive, Suite 600
Redwood City, CA 94065


2009-04-26 00:40:34.000 Info: gazelle2-8007: /lib/patent.xqy line 335: this:search_sort(cts:element-query(expanded-QName("", "ASSC_AENSC"), cts:word-query("test", ("lang=en"), 1), ()), "PATNUM", "ascending", 1, 30, "US")
2009-04-26 00:40:34.000 Info: gazelle2-8007: /lib/patent.xqy line 335: Analyzing path for search: collection("US")
2009-04-26 00:40:34.000 Info: gazelle2-8007: /lib/patent.xqy line 335: Step 1 is searchable: collection("US")
2009-04-26 00:40:34.000 Info: gazelle2-8007: /lib/patent.xqy line 335: Path is fully searchable.
2009-04-26 00:40:34.000 Info: gazelle2-8007: /lib/patent.xqy line 335: Gathering constraints.
2009-04-26 00:40:34.001 Info: gazelle2-8007: /lib/patent.xqy line 335: Search query contributed 1 constraint: cts:element-query(expanded-QName("", "ASSC_AENSC"), cts:word-query("test", ("lang=en"), 1), ())
2009-04-26 00:40:34.001 Info: gazelle2-8007: /lib/patent.xqy line 335: Order by clause contributed 1 range ordering constraint for $i: order by $i/child::PATENT/child::PATNUM ascending
2009-04-26 00:40:34.001 Info: gazelle2-8007: /lib/patent.xqy line 335: Executing search.
2009-04-26 00:40:34.091 Info: gazelle2-8007: /lib/patent.xqy line 335: Selected 4359 fragments to filter (2 ordered, 4357 unordered).
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:query-meters xsi:schemaLocation="http://marklogic.com/xdmp/query-meters query-meters.xsd"<http://marklogic.com/xdmp/query-metersquery-meters.xsd>  xmlns:qm="http://marklogic.com/xdmp/query-meters"<http://marklogic.com/xdmp/query-meters>  xmlns:xsi="http:/\
/www.w3.org/2001/XMLSchema-instance">
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:elapsed-time>PT17.261083S</qm:elapsed-time>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:requests>1</qm:requests>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:list-cache-hits>722</qm:list-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:list-cache-misses>0</qm:list-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:in-memory-list-hits>0</qm:in-memory-list-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:expanded-tree-cache-hits>4688</qm:expanded-tree-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:expanded-tree-cache-misses>8385</qm:expanded-tree-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:compressed-tree-cache-hits>3948</qm:compressed-tree-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:compressed-tree-cache-misses>4437</qm:compressed-tree-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:value-cache-hits>1</qm:value-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:value-cache-misses>4545</qm:value-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:regexp-cache-hits>1</qm:regexp-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:regexp-cache-misses>4</qm:regexp-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:link-cache-hits>0</qm:link-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:link-cache-misses>8714</qm:link-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:fragments-added>0</qm:fragments-added>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:fragments-deleted>0</qm:fragments-deleted>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:fs-program-cache-hits>1</qm:fs-program-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:fs-program-cache-misses>0</qm:fs-program-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:db-program-cache-hits>0</qm:db-program-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:db-program-cache-misses>0</qm:db-program-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misses>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
2009-04-26 00:40:51.263 Info: gazelle2-8007:<qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misses>


    



------------------------------

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general


End of General Digest, Vol 58, Issue 27
***************************************

  

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to