Re: [MarkLogic Dev General] Speeding up access to large XML file

Christopher Hamlin Thu, 23 Jun 2016 05:45:40 -0700

Hi,

>From what I understand, there really isn't an issue with querying in what
you are doing, since you are calling for a part of particular document, at
least in your given example.  If you are querying more generally, it's a
different question, and then query optimization comes into play.

The Universal Index helps with lookups for queries, but doesn't contain
data to be returned from the documents.  For that you need a range index.

To get the information in your example, the system is pulling the document
into memory from disk, expanding it (into the expanded tree cache) and
extracting per your xpath.  It actually seems pretty fast considering all
that.  But it is possible to speed things up.

Since you are using paths deep into large documents, you could try path
range indexes, which would only index what is in that path:

    http://docs.marklogic.com/guide/admin/range_index#id_40666

This way, you are just returning what is in that index, and the document
doesn't need to be opened at all.  Just call for the values and constrain
by the document-query as, for example,

    cts:values (cts:path-reference ('/x/y/z'), (), (),
cts:document-query("file1.xml"))

The document query takes care of the constraint to a particular fragment,
so no other queries are needed.

Usual Disclaimer:  adding indexes takes space and indexing time,
everything's a tradeoff.

Also, a little info regarding the use of text():
https://blakeley.com/blogofile/archives/518/

=Chris

On Thu, Jun 23, 2016 at 12:53 AM, Hans Hübner <[email protected]>
wrote:

> Hi,
>
> I have a beginner's question regarding indexing:  As far as I understand,
> MarkLogic indexes all data according to global indexing rules.  Thus, in
> general, all accesses to data should be "fast" when the universal index can
> be used.
>
> I am trying to combine data from two XML files into a report, and
> throughput is not sufficient.  I'm now looking for ways to improve the
> performance of my XQuery, either by adding indexes to MarkLogic or by
> improving the query so that it runs faster.  Here's what I have:
>
> for $key in doc("file1.xml")/a/b/c/d/e/f/g/h/i[j = 123]/k/text()
> let $data-from-file-2 := doc("file2.xml")/a/b[c/d = $key]/e/f/text()
> return
>   <pre>
>    $key => $data-from-file-2
>   </pre>
>
> Thus, I'm iterating over some subset of the nodes in file1.xml, selecting
> data from file2.xml for each of the nodes selected.  Both file1.xml and
> file2.xml are in a files residing in MarkLogic.  They have a pre-determined
> format and contain roughly 100,000 elements matching each of the two paths.
>
> It seems that the performance of the second of the two XPaths
> (doc("file2.xml")/a/b[c/d = $key]/e/f/text()) is most important.  Ideally,
> I would like this lookup to complete in under a millisecond, but it seems
> to need 75ms right now.
>
> Any help speeding this up would be greatly appreciated!
>
> Thanks,
> Hans
>
> --
> LambdaWerk GmbH
> Oranienburger Straße 87/89
> 10178 Berlin
> Phone: +49 30 555 7335 0
> Fax: +49 30 555 7335 99
>
> HRB 169991 B Amtsgericht Charlottenburg
> USt-ID: DE301399951
> Geschäftsführer:  Hans Hübner
>
> http://lambdawerk.com/
>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Speeding up access to large XML file

Reply via email to