Geert,
Have you tried xdmp:estimate() instead of count()? The difference is
that count() generally drives I/O, while xdmp:estimate() does not. For
this purpose, I believe that both will return the same results using the
default indexes. I don't think any special indexes are needed.
thanks,
-- Mike
On 2009-07-14 07:55, Geert Josten wrote:
Hi Jakob,
I am, quite brutely, doing things like this:
let $total-count := count(
xdmp:document-properties()/prop:properties/cpf:processing-status )
let $done-count := count(
xdmp:document-properties()/prop:properties[cpf:processing-status/text() =
'done' and not(cpf:state/text() = 'http://marklogic.com/states/error')] )
let $error-count := count(
xdmp:document-properties()/prop:properties[cpf:state/text() =
'http://marklogic.com/states/error'] )
let $active-count := $total-count - $error-count - $done-count
No looping, just xpath with predicates wrapped in a count. No special indexes
(yet)..
Kind regards,
Geert
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
Jakob Fix
Sent: dinsdag 14 juli 2009 16:44
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] triggering after spawning
Geert,
Good question about storing this info at all. Doing a normal
xpath takes clearly too long (five seconds or so), so yes,
you're right, I will test the index on the attribute value.
cheers,
Jakob.
On Tue, Jul 14, 2009 at 16:36, Geert
Josten<[email protected]> wrote:
I am wondering why storing it in the database at all. Why
not calculate it on demand? Putting an index on the boolean
element should allow it to perform even when you have
processed many many many documents..
You might even try doing it without adding a particular
index. It might be covered by the word index already..
I did a similar thing to keep track of all document being
processed by CPF, using counts on all documents with specific
property values to show a progress bar. I haven't tried it
with many documents yet, but just showing the progress bar
based on about 4 counts, takes only a few tens of a second..
Didn't need any special indexes at all..
Kind regards,
Geert
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf
Of Jakob
Fix
Sent: dinsdag 14 juli 2009 16:27
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] triggering after spawning
Geert,
thanks for the quick reply. Some more information which
explains the
logic behind what I'm doing:
Each day I get an input document containing a(n
increasing) number of
URLs (currently around 23.000) which return XML documents,
containing
among other things a boolean value.
Each day, I record the total number of documents actually
retrieved,
the number of "true" and the number of "false"
(the total number being a kind of checksum).
The summary document looks a bit like this:
<doi-stats>
...
<doi-stat date="2009-07-14"
recorded="{fn:current-dateTime()}" resolved="123"
unresolved="456" total="579" /> ...
</doi-stats>
Now, you're right it might be possible for each spawned task to
update this document, however, wouldn't there be a serious
performance impact?
First, I would have to decrease the number of concurrent tasks
(currently six) to maybe two (or even one?), so that
there's not too
much time spent waiting to update the document. Second, for each
document I would need to count all documents in the collection (or
the directory), and third, I'd need do the two xpaths to
retrieve the
booleans ...
The more I think about this approach, the less I'm convinced that
it's scalable, but I'd be more than happy to be convinced
otherwise!
thanks,
Jakob.
On Tue, Jul 14, 2009 at 16:02, Geert
Josten<[email protected]> wrote:
Or just have each task update the summary document, each
incrementing the finished docs counter by one (if there is any)?
Note: that effectively serialize all tasks..
Kind regards,
Geert
Drs. G.P.H. Josten
Consultant
http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is
afkomstig van Daidalos BV en is uitsluitend bestemd voor de
geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen,
verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen
rechten worden ontleend.
From: [email protected]
[mailto:[email protected]] On Behalf
Of Jakob
Fix
Sent: dinsdag 14 juli 2009 15:55
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] triggering after spawning
So I manage to spawn some twenty thousand tasks to
retrieve documents
from a remote server and to store them in MarkLogic. I've also
created a user interface with a progress bar to follow its
progress
(although this won't be used in production).
Now, what I'd like to do is to trigger an update of a summary
document once all spawned tasks have executed. From my limited
experience with ML, I cannot seem to find a satisfying
solution to
this challenge ...
My ideas:
- After the spawn call a function recursively which sleeps
for some
time and checks the number of tasks in the task queue, and
once it's
empty assumes "that that's that" and updates/creates a document?
- Have each spawned task inspect the task queue and if
there is just
one task in the queue (i.e. itself), trigger the
document update?
Hmmm, any better ideas?
Thanks,
Jakob.
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general