Geert,

The task is to go through a list of string values and perform a simple 
operation for each of them. More precise: I have about 2,000,000 URIs which I 
received as a plain text document and then turned into XML by means of Perl. 
Each of them has the following structure:

content/repository001/data/store001/location001/file.dat

and represents a path to a binary resource which is located in some remote data 
repository (nothing to do with MarkLogic).

In the same time, /data/store001/location001/ is a directory on my MarkLogic 
server where resource.xml file can be found. In that file there is a node 
<binary-resource> which must contain binary resource URI, so its value is 
similar to what was described above:

content/repository001/data/store001/location001/file.dat

What I need is to go over all of 2,000,000 URIs in my list and check if some of 
them are not referenced in the appropriate XML instances on MarkLogic, i.e. 
analyze.xqy does the following:

define variable $uri as xs:string external
(: $uri = "content/repository001/data/store001/location001/file.dat" :)

let $path :=
        fn:concat(
                "/",
                fn:string-join(
                        fn:tokenize($uri, "/")[3 to fn:last()-1],
                        "/"
                ),
                "/"
        )
(: $path = "/data/store001/location001/" :)

return
        if (xdmp:directory($path, "1")//binary-resource[1] = $item) then        
                (: Checking reference :)
                <result path="{$path}">Check OK</result>
        else
                <result path="{$path}">WARNING: Resource not bound</result>

Apologies for the long message, I just wanted to make things clear.

Thanks,
_Van

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: Monday, August 17, 2009 6:26 PM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] RE: Processing huge sequences

Hi Ivan,

Can you describe in more functional terms what you are trying to do? I have the 
impression that there should be smarter ways of tackling your problem. Do you 
really need this items.xml for instance? Wouldn't it be possible to use a 
cts:search in MarkLogic Server to compose this XML dynamically?

And analyze.xqy taking about 400 sec to perform: if it concerns only lookups 
and not to much calculation work, it sounds like a lot as well.

Have you considered taking an asynchronous approach? You can use xdmp:spawn for 
that or utilize the Content Processing Framework..

Kind regards,
Geert

>


Drs. G.P.H. Josten
Consultant


http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van 
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit 
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit 
bericht kunnen geen rechten worden ontleend.


> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Baranov, Ivan - Moscow
> Sent: maandag 17 augustus 2009 15:51
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] Processing huge sequences
>
> Hi All,
>
> I'm experiencing problems when processing long sequences.
> E.g. there is one XML file which has following structure:
>
> items.xml
> ---------
>
> <root>
>       <item id="/data/store001/location001/"/>
>       <item id="/data/store001/location012/"/>
>       <item id="/data/store003/location006/"/>
>       .
>       .
>       .
>       <item id="/data/store115/location322/"/>
> </root>
>
> Where fn:count(//item) = 15,000. For each of them I must
> perform a simple operation involving xdmp:directory(@id, "1")
> call. Say, some node check. So, what I do next is I write two
> XQuery queries using xdmp:invoke() method.
>
> main.xqy
> --------
>
> let $items := fn:doc("/items.xml")
> return
>       <results>
>       {
>               for $i in $items//item
>               return
>             try {
>                       xdmp:invoke("/analyze.xqy",
> (xs:QName("item"), fn:string($item)),
>                                       <options xmlns="xdmp:eval">
>
> <isolation>different-transaction</isolation>
>
> <prevent-deadlocks>true</prevent-deadlocks>
>                                       </options>
>                       )
>             }
>             catch ($ex) {
>                       $ex
>             }
>     }
>     </results>
>
> analyze.xqy does some xdmp:directory() stuff for each item.
> It takes approx. 400s or something for this script set to
> perform the task, which is a good result. Cool.
>
> BUT - when I tried to go through the larger list which
> included 2,000,000 items, I even failed to upload it via
> WebDAV. After cutting into pieces each of 100,000 items, I
> managed to upload them but then failed to get the results.
> After two hours of waiting ML threw an exception saying that
> the timeout limit was exceeded.
>
> I would be very thankful if someone could help me out with
> this or give me some advice.
>
> Thanks,
> Van
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to