Hi Josh,


I wrote a bit of code that partly does what Ryan describes below. It is
available at github here: https://github.com/grtjn/ml-queue, and should run
in ML 4.x..



It allows you to create task objects in your DB, and have them run by a
kind of background cron task. It comes with a small gui to follow progress.
It is configured to take half the number of available task server threads,
so leaves plenty room for other background processes. The task objects can
be prioritized.



It is still a bit crude, and only limitedly tested, but I successfully used
it to upload 150k atom feed docs without bogging my entire development
laptop.



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *seme...@hotmail.com
*Verzonden:* maandag 23 januari 2012 21:28
*Aan:* general@developer.marklogic.com
*Onderwerp:* Re: [MarkLogic Dev General] Task server / priority strategy
question



One possibility is to create a job list with priorities and put that list
in the DB. Then when the next job processor kicks off it just gets the
highest priority job off the list. This means you'd have to have a place to
park unprocessed data while it's waiting to be processed. You also may have
to break of the larger jobs into smaller ones so that any single run won't
take too long. You also may be able to start a job processor that is always
running (or effectively always running by restarting itself as it polls)
and have it dedicate to just user jobs.
------------------------------

Date: Mon, 23 Jan 2012 15:22:49 -0500
From: jwbu...@42six.com
To: general@developer.marklogic.com
Subject: [MarkLogic Dev General] Task server / priority strategy question

So I'm writing to get advice or technical pointers on managing the priority
of task queue items, basically.  Some background on the challenge we're
having with our system:

Marklogic 4.2-5 system which is holding a fair amount of data that has been
processed from a 'raw' format to our own enriched 'processed' format -
allows users to upload documents (which are processed) and also every day
processes automated feeds and allows other data producers to upload
directly to it.  Each time a file is uploaded, that is some work for the
task server as it works through our custom code.

What we've found is that the system never has a problem with all the
user-uploaded data.  Even with a multiple file uploader, people just can't
really bog down the system.  However, now that we're processing feeds and
allowing data producers to upload data directly to us, we've created the
possibility that users will encounter an unresponsive system because of the
automated feeds / producers.

What I am wondering are which if any strategies (or one I haven't listed)
can best deal with this -

1. Is it possible to have two task servers?  For us, if we had one task
server on user stuff and one task server for background stuff, that would
neatly solve our problem.

2. Is it possible to assign a higher priority to either a pipeline or a
task?  So that we could prioritize tasks related to user uploaded documents
and de-prioritize our background stuff.

3. Is my only option to set the maximum tasks for the task server to a low
enough value that we never create a huge backlog?  And then deal with the
error states on some schedule that will result from the too many tasks
error message?  This seems undesirable.  To give some sense of the numbers
we're working with, our task server max is set at 200,000 - and if it were
to ever get this high, it would take probably 2 or 3 days to clear itself
out.  We can tell users their data will not always be processed right away,
but the expectation is it should never take more than maybe 15 - 30 minutes
I think.


In general am I looking at this the wrong way?  If there is no Marklogic
way to deal with this, I can basically 'throttle' our input from feeds or
from API access.  But I'd prefer not to have to get into that if I can help
it.  Thanks in advance for any help.

-- 
Josh Warner-Burke

42SIX Solutions
(e): jwbu...@42six.com


_______________________________________________ General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to