So I'm writing to get advice or technical pointers on managing the priority
of task queue items, basically.  Some background on the challenge we're
having with our system:

Marklogic 4.2-5 system which is holding a fair amount of data that has been
processed from a 'raw' format to our own enriched 'processed' format -
allows users to upload documents (which are processed) and also every day
processes automated feeds and allows other data producers to upload
directly to it.  Each time a file is uploaded, that is some work for the
task server as it works through our custom code.

What we've found is that the system never has a problem with all the
user-uploaded data.  Even with a multiple file uploader, people just can't
really bog down the system.  However, now that we're processing feeds and
allowing data producers to upload data directly to us, we've created the
possibility that users will encounter an unresponsive system because of the
automated feeds / producers.

What I am wondering are which if any strategies (or one I haven't listed)
can best deal with this -

1. Is it possible to have two task servers?  For us, if we had one task
server on user stuff and one task server for background stuff, that would
neatly solve our problem.

2. Is it possible to assign a higher priority to either a pipeline or a
task?  So that we could prioritize tasks related to user uploaded documents
and de-prioritize our background stuff.

3. Is my only option to set the maximum tasks for the task server to a low
enough value that we never create a huge backlog?  And then deal with the
error states on some schedule that will result from the too many tasks
error message?  This seems undesirable.  To give some sense of the numbers
we're working with, our task server max is set at 200,000 - and if it were
to ever get this high, it would take probably 2 or 3 days to clear itself
out.  We can tell users their data will not always be processed right away,
but the expectation is it should never take more than maybe 15 - 30 minutes
I think.


In general am I looking at this the wrong way?  If there is no Marklogic
way to deal with this, I can basically 'throttle' our input from feeds or
from API access.  But I'd prefer not to have to get into that if I can help
it.  Thanks in advance for any help.

-- 
Josh Warner-Burke

42SIX Solutions
(e): jwbu...@42six.com
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to