[google-appengine] How can tasks be prioritized when using the task queue on google app engine?

Attila-Mihaly Balazs Mon, 25 Jul 2016 05:01:19 -0700

Hello,

I also posted this to stackoverflow, but I'm cross-posting it here in the 
hope of getting a larger number of eyes on it: 
http://stackoverflow.com/questions/38567153/how-can-tasks-be-prioritized-when-using-the-task-queue-on-google-app-engine

I'm trying to solve the following problem:

1. I have a series of "tasks" which I would like to execute
2. I have a fixed number of workers to execute these workers (since they
call an external API using urlfetch and the number of parallel calls to
this API is limited)
3. I would like for these "tasks" to be executed "as soon as possible"
(ie. minimum latency)
4. These tasks are parts of larger tasks and can be categorized based on
the size of the original task (ie. a small original task might generate 1
to 100 tasks, a medium one 100 to 1000 and a large one over 1000).

The tricky part: I would like to do all this efficiently (ie. minimum
latency and use as many parallel API calls as possible - without getting
over the limit), but at the same time try to prevent a large number of
tasks generated from "large" original tasks to delay the tasks generated
from "small" original tasks.

To put it an other way: I would like to have a "priority" assigned to each
task with "small" tasks having a higher priority and thus prevent
starvation from "large" tasks.

Some searching around doesn't seem to indicate that anything pre-made is
available, so I came up with the following:

- create three push queues: tasks-small, tasks-medium, tasks-large
- set a maximum number of concurrent request for each such that the
total is the maximum number of concurrent API calls (for example if the
max. no. concurrent API calls is 200, I could set up tasks-small to have
a max_concurrent_requests of 30, tasks-medium 60 and tasks-large 100)
- when enqueueing a task, check the no. pending task in each queue
(using something like the QueueStatistics class), and, if an other queue is
not 100% utilized, enqueue the task there, otherwise just enqueue the task
on the queue with the corresponding size.

For example, if we have task T1 which is part of a small task, first check
if tasks-small has free "slots" and enqueue it there. Otherwise check
tasks-medium and tasks-large. If none of them have free slots, enqueue it
on tasks-small anyway and it will be processed after the tasks added before
it are processed (note: this is not optimal because if "slots" free up on
the other queues, they still won't process pending tasks from the
tasks-small queue)

An other option would be to use PULL queue and have a central "coordinator"
pull from that queue based on priorities and dispatch them, however that
seems to add a little more latency.

However this seems a little bit hackish and I'm wondering if there are
better alternatives out there.

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-appengine/a4193f6c-5ff1-4864-85a1-9d0af4488a3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] How can tasks be prioritized when using the task queue on google app engine?

Reply via email to