[MarkLogic Dev General] questions about optimize the query for multiple DTDs

Helen Chen Thu, 04 Oct 2007 08:52:59 -0700

Hello,

We have a huge amout of legacy data, which follows our DTD specification from
version1 to version4. We loaded all of them into Marklogic using the attribute
"format" of root element "article" to distinguish the version.  Of course each
version will have some difference in the tree structure.


Now we want to search on data like finding out all the authors with surname =
'Smith'.

If I just do the search for one version, the speed I tested is 0.7 second which
is ok to me, but I hope to improve it more.

The main problem for us is: we want to do the search for the whole database,
not just based on one version of data. We copied the query that used for one
version and then use cts:or-query() to put them together as one big query, and
we also did the pagination to save the time, but it took about 2.5 seconds for
the query to return, it is not acceptable to us.

The following is the query we used:
since we have surname in some other path, we have to kind to use
cts:element-query() to specify the unique path for the surname we want.  So
inside the and-query, you can see the path for surname in each version.  The
root element is called "article", we use the subnodes of article for the path to
reduce the time, and these subnodes front,spin and meta are unique in each
article, this way we make sure we didn't add extra overhead for finding root
element from subnodes. 

(cts:search(
fn:doc(),
        cts:or-query((
                cts:and-query((
                cts:element-query(xs:QName("front"),
            cts:element-query(xs:QName("surname"),
                 cts:word-query("smith",
("case-insensitive","diacritic-insensitive","punctuation-sensitive","lang=en"),
1)
                ))
       ,
        cts:element-attribute-word-query(xs:QName("article"),
xs:QName("format"), "V1")
                )),
        cts:and-query((
                cts:element-query(xs:QName("front"),
            cts:element-query(xs:QName("surname"),
                 cts:word-query("smith",
("case-insensitive","diacritic-insensitive","punctuation-sensitive","lang=en"),
1)
                ))
       ,
        cts:element-attribute-word-query(xs:QName("article"),
xs:QName("format"), "V2")
                )),
         cts:and-query((
                cts:element-query(xs:QName("spin"),
            cts:element-query(xs:QName("body"),
            cts:element-query(xs:QName("surname"),
                cts:word-query("smith",
("case-insensitive","diacritic-insensitive","punctuation-sensitive","lang=en"),
1)
                )))
       ,
        cts:element-attribute-word-query(xs:QName("article"),
xs:QName("format"), "V3")
                )),
       cts:and-query((
                        cts:element-query(xs:QName("meta"),
                cts:element-query(xs:QName("surname"),
                 cts:word-query("smith",
("case-insensitive","diacritic-insensitive","punctuation-sensitive","lang=en"),
1)
                ))
       ,
        cts:element-attribute-word-query(xs:QName("article"),
xs:QName("format"), "V4")
                ))
        ))
))[50 to 60]



Can someone give us any suggestions on how to improve the query? And can
Marklogic handle the search for  multiple DTD structures in the same database
with acceptable speed? 

Thanks, Helen
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

[MarkLogic Dev General] questions about optimize the query for multiple DTDs

Reply via email to