On Tue, 2011-01-04 at 15:29 +0000, Julian Edwards wrote: > Dear all > > I've seen this problem pop up in similar ways a few times now, where we're > processing a bunch of data in a cron job (whether externally on the API, or > internally) and it needs to do a batch of work, remember where it left off > (whether reaching a batch limit or the live data is paused), and continue > later. > > Typically to solve this, the client processing the data stores some piece of > context about the data it's processing, and uses that data to re-start from > the right place next time. > > I think it would be a good idea to formalise a design around this in such a > way that will also be beneficial to us when we eventually start using a > message queuing application. > > In a previous life, the context data that I've used for this is a timestamp, > and it worked very well in pretty much all cases I came across. The client > application simply provides the same timestamp to a query/api call from the > last item it processed, and the data continues to flow from where it left > off. > This ticked all the boxes for data integrity and polling or streaming usage. >
I'm curious why one can't just start using message queues on the batch job only. Rather than a cron job that does all the work, the batch job could simply push all the work into a queue. Whenever the message queue is ready for frontend consumption, the batch jobs go away and the frontend starts feeding the backend directly. Trying to emulate the queue's robustness seems a noble, but possibly unnecessary effort if queues are coming any time soon. _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

