The xdmp:query-trace() function is handy for figuring out whether or not
the query terms are indexed (aka 'searchable constraints').
For most applications XPath ought to be about the same speed as
cts:search(). But if doc("doc.xml")/FOO[BAR eq 'text'] is slow, then I'd
guess that BAR is a fragment root, or FOO is a fragment parent. If so,
the slowdown is because the query must match one fragment, while the
expression returns another fragment. To speed it up, I would write it as:
doc("doc.xml")/FOO/BAR[. eq 'text']
Note that this expression is not exactly equivalent to cts:search(
doc("doc.xml")/FOO/BAR , "text") though. The XPath specifies a value
match with op:eq, while the cts:search specifies an xs:string which will
be treated as a cts:word-query. So one matches values and the other
matches words. The equivalent XPath would be:
doc("doc.xml")/FOO/BAR[cts:contains(., 'text')]
That's almost exactly the same as the cts:search() expression. The main
difference is that the cts:search() will return results in
relevance-ranked order, while the XPath will return results in document
order.
-- Mike
On 2009-11-23 11:36, Lee, David wrote:
I think what Karl is expressing is frustration that basic xpath
expressions appear not to use indexes.
I too am 'in the dark' about that ... and would love some advise.
Why, for example
cts:search( doc("doc.xml")/FOO/BAR , "text")
uses index ("instant results" )
but apparently
doc("doc.xml")/FOO[BAR eq 'text']
seems to iterate through the list and not use indexes. (painfully slow
results ... )
I'm sure this is a mis-understanding. But I've hit it myself when
trying to port over RDBMS-Like structures to ML ... doing random-access
lookups of 'record like things' using key values and XPath just dont
seem to be using indexes. There Must Be A Way !
-David
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Geert
Josten
Sent: Monday, November 23, 2009 2:29 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] XML structure/schema design for MLS
Hi Karl,
Personally, I would choose the shortest way to make things work. ;-) And
MarkLogic Server doesn't require you to choose between the three. You
can intermingle if you like as well.
If your current data is following a certain standard, then it is likely
that it is so for a certain reason. Perhaps it is necessary to be able
to exchange data with other parties or applications. This is a very
strong reason to preserve the content in its original format, whether
MarkLogic Server can handle that well or not. But thanks to namespaces
and document properties in MarkLogic Server, it is quite easy to add
information that is optimized for searching or user presentation, to
make less optimally structured content work better in MarkLogic Server.
You can always store calculated data in document properties, add
namespaced attributes to specific nodes to optimize certain things and
filter them out when exchanging data with other systems, add meta
information in a separate xml structure that is inserted in the existing
data structure, or wrap the contents in a new root element which allows
additional information at root level. Document properties prevent
mingling data, the last solution is one in which separating the data is
very easy.
But apart from that, it might be just as likely that MarkLogic Server
could perform really well with the existing structure, if indices and
search expressions would be chosen carefully. Unfortunately, you leave
us in the dark why you think solution #2 should dominate entirely over
the others. Perhaps you could elaborate on that first? And while at it,
give us some hints on the big picture. What are you trying to achieve in
general with MarkLogic Server?
Kind regards,
Geert
Drs. G.P.H. Josten
Consultant
http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.
From: [email protected]
[mailto:[email protected]] On Behalf Of
Karl Erisman
Sent: maandag 23 november 2009 3:14
To: [email protected]
Subject: [MarkLogic Dev General] XML structure/schema design for MLS
I have a general question about choosing an XML structure
(schema design if using schemas) for use with MarkLogic. My
particular situation involves storing clinical data. There
are multiple opposing forces that could motivate choosing one
schema structure over another.
The main ones are:
(1) standards compliance: it would be nice if the internal
storage format is compatible with existing standard schemas
for clinical data in XML (to take advantage of existing tools
that work against the standard schemas and to allow exchange
with external systems without requiring transformation)
(2) ease of handling in MLS, specifically *indexing* and *searching*
(3) "clean" XML (structure that makes sense semantically to a
human viewer)
The more I experiment with cts:query and search:search, the
more I tend to think that #2 should dominate entirely, to the
point of ignoring the others. As it turns out, some standard
data formats are really awkward to work with in MLS.
So, do others just organize their content specifically for
MLS and run transformations when needed? What does Mark
Logic recommend? What have your experiences been?
Thank you,
Karl
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general