Re: [MarkLogic Dev General] Prioritizing entries in the taskserverqueue

Geert Josten Wed, 28 Jul 2010 01:14:39 -0700

Hi Tim,

How about using scheduled jobs. Create a job for each priority. The high 
priority job schedule scans for tasks on a high frequency, low prio job 
schedule on a low frequency. Each job scans in a dedicated directory for 
documents to process. If there are multiple types of tasks, then you will have 
to create separate jobs for those as well. You will also have to think of 
something to determine execution order, but that could be based on 
last-modified timestamp (part of document-properties). You could even make the 
low prio job also scan the high prio directory and make itself suspend 
processing until high prio directory is empty.


You probably would like to create a kind of controller to manage all this, a 
library module that has capabilities to create dirs and jobs, submit tasks to 
them, and can coordinate processing a bit..

Kind regards,
Geert

>


drs. G.P.H. (Geert) Josten
Consultant

Daidalos BV
Hoekeindsehof 1-4
2665 JZ Bleiswijk

T +31 (0)10 850 1200
F +31 (0)10 850 1199

mailto:[email protected]
http://www.daidalos.nl/

KvK 27164984


De informatie - verzonden in of met dit e-mailbericht - is afkomstig van 
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit 
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit 
bericht kunnen geen rechten worden ontleend.

> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Tim Meagher
> Sent: woensdag 28 juli 2010 3:23
> To: 'General Mark Logic Developer Discussion'
> Subject: Re: [MarkLogic Dev General] Prioritizing entries in
> the taskserver queue
>
> Hi Danny,
>
>
>
> At this time I don't have the luxury of using another host.
> We're looking into clustering, but not quite there yet.  Even
> so the whole notion of prioritizing document processing
> across a variety of applications is going to be challenging.
>
>
>
> As you noted when things get to the CPF it's too late, but I
> think I'll change my CPF action queries so that instead of
> performing the document processing, they will merely supply
> the requested action to a set of prioritized queues.  The
> question then becomes, 1) What is the most efficient way to
> build a queue, to fill it with processing instructions, and
> to extract the instructions from it, and 2) how to set up and
> trigger a dispatcher that maximizes the use of the task
> server threads while extracting instructions from the queues
> in priority order?  (The assumption is that the CPF actions
> should be fairly quick in relation to the actual processing
> of each document and that processing does not starve the CPFs
> from filling the priority queues).
>
>
>
> I don't think it makes sense to use a single document for
> each queue, instead I think I need either a dedicated
> directory URI for each priority queue or a dedicated
> collection for each queue within a single directory URI. I
> like using directory URIs just because it's easier to access
> them via webDAV if necessary.  I wonder if indexing the
> documents based on time of creation and priority would be
> useful for quickly identifying which document of the same
> priority should be processed next.  I figure I'll use tail
> recursion to find the next document to process.
>
>
>
> As far as the dispatcher, the easiest thing to do is to have
> one dispatcher which might be useful as each process consumes
> a single web resource; however, allowing for multiple
> dispatchers up to the number of task server threads (the
> number of which dispatchers can be configured to tweak
> performance) will probably increase performance. The use of
> multiple dispatchers adds complexity because coordination
> would be required to avoid redundant processing.
>
>
>
> I suppose you see where I'm going with this and you and/or
> others can provide further suggestions based in experience
> and a better understanding of the guts of MarkLogic.
>
>
>
> Regards,
>
>
>
> Tim
>
>
>
> ________________________________
>
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Danny Sokolsky
> Sent: Tuesday, July 27, 2010 6:11 PM
> To: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Prioritizing entries in
> the taskserver queue
>
>
>
> Hi Tim,
>
>
>
> I don't think there is any way to de-prioritize the order of
> something on the task server queue once it is already
> spawned.  If you wanted to do that, it would have to be
> before it is spawned.
>
>
>
> What you might be able to do (and I think you hinted at this
> in your question) is to use a different host to spawn the
> tasks to.  The host that a task is spawned to is the same
> host in which the query is evaluated (the e-node), so you can
> try to send higher priority tasks to a different (and less
> used) e-node.  I am not sure what the best way to do this is,
> and I would guess that would depend on your application.  It
> could be as simple as having some dispatcher code somewhere
> that looks at the priority (your application would have to
> supply this) and then redirects the query to another server.
> Or you could do it in a load balancer or proxy forwarder.  By
> the time it gets to CPF, however, it is probably too late, so
> this would have to come before the CPF event is triggered.
>
>
>
> I don't know of another way to do this, as there is no API do
> remove or reorder items in the task server queue.
>
>
>
> -Danny
>
>
>
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Tim Meagher
> Sent: Monday, July 26, 2010 4:13 AM
> To: 'General Mark Logic Developer Discussion'
> Subject: [MarkLogic Dev General] Prioritizing entries in the
> task server queue
>
>
>
> Hi Folks,
>
>
>
> I have a workflow for processing documents of various
> priorities using the Content Processing Framework.  The
> problem I'm running into is that I might get 10,000 documents
> that need to be processed at a low priority which get
> submitted to the task server queue, but then maybe 10
> documents come in that are of a higher priority (I'm using
> these counts for purposes of discussion).  What I would like
> to be able to do is to insert the 10 high priority items in
> the queue so that they are processed before any outstanding
> low priority items in the task server queue, in other words I
> want to interrupt FIFO processing.  I'm not concerned about
> the high priority processing starving low priority processing
> as the volume of the high priority items is relatively low,
> but nonetheless an elegant solution would allow me to
> fine-tune the process so that low priority starvation does not occur.
>
>
>
> There was some previous discussion about using tail-recursion
> with xdmp:spawn.  That way I would hopefully be able to
> select the next document to process based on its relative
> priority.  In that case I would probably want to revise the
> CPF process to merely fill customized priority queues, e.g.
> high, mid, and low priority queues and to use tail recursion
> to examine the queues and decide which document to process next.
>
>
>
> I get the impression that clustering could be a useful way to
> create task servers that are dedicated to higher and lower
> priority processing for the needs of an entire organization,
> but it seems to me that allowing for pre-emption in a given
> task server could be a really useful feature.
>
>
>
> Perhaps there are some existing features that are provided to
> deal with just this problem.  There are times when I've
> submitted more docs to be processed by the task server and
> would like to be able to dequeue them - I suppose that a
> prioritization solution would also allow for dequeuing tasks.
>
>
>
> Thanks ahead of time for any help!
>
>
>
> Tim Meagher
>
>
>
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Prioritizing entries in the taskserverqueue

Reply via email to