Re: [MarkLogic Dev General] cts:search question

Michael Blakeley Thu, 16 Jul 2009 15:04:12 -0700

Jakob,

I'm fairly confused about what you're trying to do with this latestquery. I don't see the relationship between your latest query andanything else in this thread.

There is very little performance difference between cts:search() and anequivalent XPath expression. Both make use of the same indexes. The maindifference is that XPath results are in document order, while cts:searchorders by relevance.

I'm not sure why you're seeing XDMP-EXPNTREECACHEFULL with yourcts:search, but there are only two possibilities. Either yourcts:search() returns too many matches to fit into the expanded-treecache, or the rather odd arg2 (starting with collection... was thatintentional?) returns too many matches for the expanded-tree cache.Generally, the solution to either would be to limit the number of matches.

The tutorial athttp://developer.marklogic.com/howto/tutorials/2006-09-paginated-search.xqymight be helpful to you - it discusses this issue and others. The SearchDeveloper Guide(http://developer.marklogic.com/pubs/4.1/books/search-dev-guide.pdf)might be helpful, too.

If you are using 4.1, you might also want to look at the search:search()API as an alternative to cts:search. It also has a tutorial and is alsocovered in the search guide.


  http://developer.marklogic.com/pubs/4.1/apidocs/SearchAPI.html


http://developer.marklogic.com/howto/tutorials/2009-07-search-api-walkthrough.xqy

BTW, it may be that you expectfn:collection("collection-2009-7-11")/cr:crossref_result/cr:query_result[1]to return at most one node. That isn't how XPath works: that expressioncould return any number of nodes. The positional predicate is evaluatedseparately for each context node, not for the sequence of all nodes. Youmight have meant something more like (//a)[1].


-- Mike

On 2009-07-16 12:40, Jakob Fix wrote:

Mike,

thanks for your reply, I agree with the points you make regarding namespaces.

Moving on, would it not be faster to use something like the cts:query
expressions shown in my original message instead of a simple xpath?  I
had the impression that it was recommended to me by one of the posters
(you?) to create an index in order to accelerate the search.


I'm also trying this (which gives a "expanded tree cache full" error :( ):

declare namespace cr = "http://www.crossref.org/qrschema/2.0";;

(xdmp:query-trace(true()),

cts:search(
   fn:doc(),
   
fn:collection("collection-2009-7-11")/cr:crossref_result/cr:query_result[1]/cr:body[1]/cr:query[...@status='resolved']
),

xdmp:query-trace(false()))

[1.0-ml] XDMP-EXPNTREECACHEFULL: xdmp:eval("declare namespace cr =
&quot;http://www.crossref.org/qrschema/2....";, (),<options
xmlns="xdmp:eval"><database>10374816636749856048</database><modules>10374816636749...</options>)
-- Expanded tree cache full on host

So, I don't even see the output of the trace ...

cheers,
Jakob.



On Thu, Jul 16, 2009 at 20:21, Michael
Blakeley<[email protected]>  wrote:

Jakob,

Do you really want 'query' in *any* namespace? It looks to me like 'query'
is in the empty namespace, and is always a child of the root 'result', so I
would write '/result/query' or '//query' instead of '//*:query'. If you need
to find 'query' in multiple namespaces, I recommend enumerating all the
possibilities.

Expressions using '*:' are best avoided in production code. They tend to
introduce bugs into your application, and they can't be resolved using the
server's indexes. While '*:' expressions can be useful when debugging, they
should be removed as soon as possible. When doing code reviews, I treat them
as a red flag.

thanks,
-- Mike

On 2009-07-16 04:38, Jakob Fix wrote:

Here I am again ...

1) added a number of test items to the collection "test"
2) each document contains xml like this

<result>
   ....
   <query key="555-555" status="resolved" fl_count="0">
     <doi type="journal_title">10.1787/1684341x</doi>
     <issn type="print">16095316</issn>
     <journal_title>Documents de l OCDE</journal_title>
   </query>
</result>

3) I am interested in all documents in the "test" collection where the
xpath //*:que...@status="resolved"] on the one side, and
[...@status="unresolved"] on the other side - using the xpath directly
works, but is too slow over many thousand documents.

4) I've created an attribute range index for xs:string, "query",
"status" (no namespaces defined; btw, I also created an
element-attribute word index, but it seems that this is not necessary)

5) I was hoping the following query would return the expected results,
but it doesn't:

cts:search(fn:doc(),
    cts:and-query((
      cts:collection-query(("test")),
      cts:element-attribute-word-match(xs:QName("query"),
xs:QName("status"), "unresolved")
    ))
)
return xdmp:node-uri($x)

6) three of the four test documents have a @status="resolved" and one
"unresolved" so I expected one uri for the above query.  However, the
result is this:
/data/2009/07/16/1684341X.xml
/data/2009/07/16/16097513.xml
/data/2009/07/16/16812328.xml
/data/2009/07/16/16097408.xml

I do get an empty sequence when asking for @status="resolved" ...  Is
this just a configuration problem, or is my query wrongly constructed?

Thanks, as usual, for your help,
Jakob.
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] cts:search question

Reply via email to