One possibility is to create a job list with priorities and put that list in 
the DB. Then when the next job processor kicks off it just gets the highest 
priority job off the list. This means you'd have to have a place to park 
unprocessed data while it's waiting to be processed. You also may have to break 
of the larger jobs into smaller ones so that any single run won't take too 
long. You also may be able to start a job processor that is always running (or 
effectively always running by restarting itself as it polls) and have it 
dedicate to just user jobs.

Date: Mon, 23 Jan 2012 15:22:49 -0500
From: jwbu...@42six.com
To: general@developer.marklogic.com
Subject: [MarkLogic Dev General] Task server / priority strategy question

So I'm writing to get advice or technical pointers on managing the priority of 
task queue items, basically.  Some background on the challenge we're having 
with our system:

Marklogic 4.2-5 system which is holding a fair amount of data that has been 
processed from a 'raw' format to our own enriched 'processed' format - allows 
users to upload documents (which are processed) and also every day processes 
automated feeds and allows other data producers to upload directly to it.  Each 
time a file is uploaded, that is some work for the task server as it works 
through our custom code.


What we've found is that the system never has a problem with all the 
user-uploaded data.  Even with a multiple file uploader, people just can't 
really bog down the system.  However, now that we're processing feeds and 
allowing data producers to upload data directly to us, we've created the 
possibility that users will encounter an unresponsive system because of the 
automated feeds / producers.


What I am wondering are which if any strategies (or one I haven't listed) can 
best deal with this - 

1. Is it possible to have two task servers?  For us, if we had one task server 
on user stuff and one task server for background stuff, that would neatly solve 
our problem.


2. Is it possible to assign a higher priority to either a pipeline or a task?  
So that we could prioritize tasks related to user uploaded documents and 
de-prioritize our background stuff.

3. Is my only option to set the maximum tasks for the task server to a low 
enough value that we never create a huge backlog?  And then deal with the error 
states on some schedule that will result from the too many tasks error message? 
 This seems undesirable.  To give some sense of the numbers we're working with, 
our task server max is set at 200,000 - and if it were to ever get this high, 
it would take probably 2 or 3 days to clear itself out.  We can tell users 
their data will not always be processed right away, but the expectation is it 
should never take more than maybe 15 - 30 minutes I think.



In general am I looking at this the wrong way?  If there is no Marklogic way to 
deal with this, I can basically 'throttle' our input from feeds or from API 
access.  But I'd prefer not to have to get into that if I can help it.  Thanks 
in advance for any help.

-- 
Josh Warner-Burke

42SIX Solutions
(e): jwbu...@42six.com



_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general                         
                  
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to