Hi Danny,
At this time I don't have the luxury of using another host. We're looking into clustering, but not quite there yet. Even so the whole notion of prioritizing document processing across a variety of applications is going to be challenging. As you noted when things get to the CPF it's too late, but I think I'll change my CPF action queries so that instead of performing the document processing, they will merely supply the requested action to a set of prioritized queues. The question then becomes, 1) What is the most efficient way to build a queue, to fill it with processing instructions, and to extract the instructions from it, and 2) how to set up and trigger a dispatcher that maximizes the use of the task server threads while extracting instructions from the queues in priority order? (The assumption is that the CPF actions should be fairly quick in relation to the actual processing of each document and that processing does not starve the CPFs from filling the priority queues). I don't think it makes sense to use a single document for each queue, instead I think I need either a dedicated directory URI for each priority queue or a dedicated collection for each queue within a single directory URI. I like using directory URIs just because it's easier to access them via webDAV if necessary. I wonder if indexing the documents based on time of creation and priority would be useful for quickly identifying which document of the same priority should be processed next. I figure I'll use tail recursion to find the next document to process. As far as the dispatcher, the easiest thing to do is to have one dispatcher which might be useful as each process consumes a single web resource; however, allowing for multiple dispatchers up to the number of task server threads (the number of which dispatchers can be configured to tweak performance) will probably increase performance. The use of multiple dispatchers adds complexity because coordination would be required to avoid redundant processing. I suppose you see where I'm going with this and you and/or others can provide further suggestions based in experience and a better understanding of the guts of MarkLogic. Regards, Tim _____ From: [email protected] [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: Tuesday, July 27, 2010 6:11 PM To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] Prioritizing entries in the taskserver queue Hi Tim, I don't think there is any way to de-prioritize the order of something on the task server queue once it is already spawned. If you wanted to do that, it would have to be before it is spawned. What you might be able to do (and I think you hinted at this in your question) is to use a different host to spawn the tasks to. The host that a task is spawned to is the same host in which the query is evaluated (the e-node), so you can try to send higher priority tasks to a different (and less used) e-node. I am not sure what the best way to do this is, and I would guess that would depend on your application. It could be as simple as having some dispatcher code somewhere that looks at the priority (your application would have to supply this) and then redirects the query to another server. Or you could do it in a load balancer or proxy forwarder. By the time it gets to CPF, however, it is probably too late, so this would have to come before the CPF event is triggered. I don't know of another way to do this, as there is no API do remove or reorder items in the task server queue. -Danny From: [email protected] [mailto:[email protected]] On Behalf Of Tim Meagher Sent: Monday, July 26, 2010 4:13 AM To: 'General Mark Logic Developer Discussion' Subject: [MarkLogic Dev General] Prioritizing entries in the task server queue Hi Folks, I have a workflow for processing documents of various priorities using the Content Processing Framework. The problem I'm running into is that I might get 10,000 documents that need to be processed at a low priority which get submitted to the task server queue, but then maybe 10 documents come in that are of a higher priority (I'm using these counts for purposes of discussion). What I would like to be able to do is to insert the 10 high priority items in the queue so that they are processed before any outstanding low priority items in the task server queue, in other words I want to interrupt FIFO processing. I'm not concerned about the high priority processing starving low priority processing as the volume of the high priority items is relatively low, but nonetheless an elegant solution would allow me to fine-tune the process so that low priority starvation does not occur. There was some previous discussion about using tail-recursion with xdmp:spawn. That way I would hopefully be able to select the next document to process based on its relative priority. In that case I would probably want to revise the CPF process to merely fill customized priority queues, e.g. high, mid, and low priority queues and to use tail recursion to examine the queues and decide which document to process next. I get the impression that clustering could be a useful way to create task servers that are dedicated to higher and lower priority processing for the needs of an entire organization, but it seems to me that allowing for pre-emption in a given task server could be a really useful feature. Perhaps there are some existing features that are provided to deal with just this problem. There are times when I've submitted more docs to be processed by the task server and would like to be able to dequeue them - I suppose that a prioritization solution would also allow for dequeuing tasks. Thanks ahead of time for any help! Tim Meagher
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
