Thanks Danny. Just by looking at the syntax, I'm pretty this is what I want.
I'll try this tomorrow and will confirm. Thanks again.
--------------------------------------------------
From: "Danny Sokolsky" <[EMAIL PROTECTED]>
Sent: Monday, December 10, 2007 4:06 PM
To: "General Mark Logic Developer Discussion"
<[email protected]>
Subject: RE: [MarkLogic Dev General] using XQuery on Word documents
Hi Yves,
To do this efficiently, it is very helpful to have a URI lexicon. A URI
lexicon gives you very fast access the URI of every document in the
database. You enable the URI lexicon in the Admin Interface database
config page for your database.
Once you have the URI lexicon created (and reindexing has completed), you
can do something like this to get what you want:
for $x in cts:uris()
where fn:ends-with($x, ".docx") and
xdmp:zip-get(doc($x), "customXml/item1.xml")/Customer/Date
gt xs:date("2008-01-01")
return
$x
If you do it without the URI lexicon, you will probably need to do it in
batches, because to get the URIs you need to first fetch the document and
then do xdmp:node-uri to find its URI. This can effectively attempt to
put the entire database in memory, and you therefore would probably need
to do it in batches without the URI lexicon.
If you have a lot of docx's in your database, you still probably want to
do this in batches.
Is this what you were looking for?
-Danny
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Yves Dolce
Sent: Monday, December 10, 2007 2:27 PM
To: [email protected]
Subject: [MarkLogic Dev General] using XQuery on Word documents
This is a question that will have a simple answer. If only I knew more
about XQuery...
If I run the following line in CQ:
xdmp:zip-get(doc("Contract.docx"), "customXml/item1.xml")
I get:
<Customer>
<Date>2008-11-15T00:00:00</Date>
<CompanyName>Bebop Corporation</CompanyName>
<FirstName>Erick</FirstName>
<LastName>Trojan</LastName>
<SSN>1111-22-3333</SSN>
<Address>Av. Revolucion 841, DF, CP 03910, Mexico</Address>
<ContactTitle>Test Manager</ContactTitle>
<Phone>+52 (55) 6666-66666</Phone>
</Customer>
How should I express a query that essentially says: for each docx file in
the DB, get me its customXml/item1.xml part, if it has one, and the <Date>
element in it is greater than 1/1/2008.
Does my question make sense? Thanks!
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general