One possibility is to create a job list with priorities and put that list in the DB. Then when the next job processor kicks off it just gets the highest priority job off the list. This means you'd have to have a place to park unprocessed data while it's waiting to be processed. You also may have to break of the larger jobs into smaller ones so that any single run won't take too long. You also may be able to start a job processor that is always running (or effectively always running by restarting itself as it polls) and have it dedicate to just user jobs.
Date: Mon, 23 Jan 2012 15:22:49 -0500 From: jwbu...@42six.com To: general@developer.marklogic.com Subject: [MarkLogic Dev General] Task server / priority strategy question So I'm writing to get advice or technical pointers on managing the priority of task queue items, basically. Some background on the challenge we're having with our system: Marklogic 4.2-5 system which is holding a fair amount of data that has been processed from a 'raw' format to our own enriched 'processed' format - allows users to upload documents (which are processed) and also every day processes automated feeds and allows other data producers to upload directly to it. Each time a file is uploaded, that is some work for the task server as it works through our custom code. What we've found is that the system never has a problem with all the user-uploaded data. Even with a multiple file uploader, people just can't really bog down the system. However, now that we're processing feeds and allowing data producers to upload data directly to us, we've created the possibility that users will encounter an unresponsive system because of the automated feeds / producers. What I am wondering are which if any strategies (or one I haven't listed) can best deal with this - 1. Is it possible to have two task servers? For us, if we had one task server on user stuff and one task server for background stuff, that would neatly solve our problem. 2. Is it possible to assign a higher priority to either a pipeline or a task? So that we could prioritize tasks related to user uploaded documents and de-prioritize our background stuff. 3. Is my only option to set the maximum tasks for the task server to a low enough value that we never create a huge backlog? And then deal with the error states on some schedule that will result from the too many tasks error message? This seems undesirable. To give some sense of the numbers we're working with, our task server max is set at 200,000 - and if it were to ever get this high, it would take probably 2 or 3 days to clear itself out. We can tell users their data will not always be processed right away, but the expectation is it should never take more than maybe 15 - 30 minutes I think. In general am I looking at this the wrong way? If there is no Marklogic way to deal with this, I can basically 'throttle' our input from feeds or from API access. But I'd prefer not to have to get into that if I can help it. Thanks in advance for any help. -- Josh Warner-Burke 42SIX Solutions (e): jwbu...@42six.com _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general