Hi Yves,

To do this efficiently, it is very helpful to have a URI lexicon.  A URI 
lexicon gives you very fast access the URI of every document in the database.  
You enable the URI lexicon in the Admin Interface database config page for your 
database.  

Once you have the URI lexicon created (and reindexing has completed), you can 
do something like this to get what you want:

for $x in cts:uris() 
where fn:ends-with($x, ".docx") and
      xdmp:zip-get(doc($x), "customXml/item1.xml")/Customer/Date 
          gt xs:date("2008-01-01")
return
$x

If you do it without the URI lexicon, you will probably need to do it in 
batches, because to get the URIs you need to first fetch the document and then 
do xdmp:node-uri to find its URI.  This can effectively attempt to put the 
entire database in memory, and you therefore would probably need to do it in 
batches without the URI lexicon.  

If you have a lot of docx's in your database, you still probably want to do 
this in batches.

Is this what you were looking for?

-Danny

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yves Dolce
Sent: Monday, December 10, 2007 2:27 PM
To: [email protected]
Subject: [MarkLogic Dev General] using XQuery on Word documents

This is a question that will have a simple answer. If only I knew more about 
XQuery...
 
If I run the following line in CQ:
xdmp:zip-get(doc("Contract.docx"), "customXml/item1.xml")

I get:
<Customer>
     <Date>2008-11-15T00:00:00</Date> 
     <CompanyName>Bebop Corporation</CompanyName> 
     <FirstName>Erick</FirstName> 
     <LastName>Trojan</LastName> 
     <SSN>1111-22-3333</SSN> 
     <Address>Av. Revolucion 841, DF, CP 03910, Mexico</Address> 
     <ContactTitle>Test Manager</ContactTitle> 
     <Phone>+52 (55) 6666-66666</Phone> 
</Customer>
 
How should I express a query that essentially says: for each docx file in the 
DB, get me its customXml/item1.xml part, if it has one, and the <Date> element 
in it is greater than 1/1/2008.
 
Does my question make sense? Thanks!
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to