Thanks Danny. Just by looking at the syntax, I'm pretty this is what I want. I'll try this tomorrow and will confirm. Thanks again.

--------------------------------------------------
From: "Danny Sokolsky" <[EMAIL PROTECTED]>
Sent: Monday, December 10, 2007 4:06 PM
To: "General Mark Logic Developer Discussion" <[email protected]>
Subject: RE: [MarkLogic Dev General] using XQuery on Word documents

Hi Yves,

To do this efficiently, it is very helpful to have a URI lexicon. A URI lexicon gives you very fast access the URI of every document in the database. You enable the URI lexicon in the Admin Interface database config page for your database.

Once you have the URI lexicon created (and reindexing has completed), you can do something like this to get what you want:

for $x in cts:uris()
where fn:ends-with($x, ".docx") and
     xdmp:zip-get(doc($x), "customXml/item1.xml")/Customer/Date
         gt xs:date("2008-01-01")
return
$x

If you do it without the URI lexicon, you will probably need to do it in batches, because to get the URIs you need to first fetch the document and then do xdmp:node-uri to find its URI. This can effectively attempt to put the entire database in memory, and you therefore would probably need to do it in batches without the URI lexicon.

If you have a lot of docx's in your database, you still probably want to do this in batches.

Is this what you were looking for?

-Danny

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yves Dolce
Sent: Monday, December 10, 2007 2:27 PM
To: [email protected]
Subject: [MarkLogic Dev General] using XQuery on Word documents

This is a question that will have a simple answer. If only I knew more about XQuery...

If I run the following line in CQ:
xdmp:zip-get(doc("Contract.docx"), "customXml/item1.xml")

I get:
<Customer>
<Date>2008-11-15T00:00:00</Date>
<CompanyName>Bebop Corporation</CompanyName>
<FirstName>Erick</FirstName>
<LastName>Trojan</LastName>
<SSN>1111-22-3333</SSN>
<Address>Av. Revolucion 841, DF, CP 03910, Mexico</Address>
<ContactTitle>Test Manager</ContactTitle>
<Phone>+52 (55) 6666-66666</Phone>
</Customer>

How should I express a query that essentially says: for each docx file in the DB, get me its customXml/item1.xml part, if it has one, and the <Date> element in it is greater than 1/1/2008.

Does my question make sense? Thanks!
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to