I do something similar but it's based on the job being run, not when it was queued. So that I only process a job for a given item once every five minutes, I store the item's ID in memcache with a five minute expiration time once processing is done. Then when a worker picks up the job, I check to see if that item exists before I do any processing. Adding a delay to the beanstalk job would make it more likely that only the last change gets indexed.
On Mon, Oct 11, 2010 at 3:06 PM, Ron Mayer <[email protected]> wrote: > I'm using beanstalkd to queue up jobs to re-index documents whenever they > get updated; so my jobs in this case are all simple paths/urls to > documents. > > For some documents that change faster than the queue is drained, I end up > getting the same job in the queue dozens of times; and then doing extra > work > re-processing them unnecessarily. > > Is there a good way to say "put this in the queue if it's not already in > there"? > > > If not, does anyone have a good way of handling this outside of beanstalkd? > > I'm considering adding something to memcached saying > "document file://whatever is in the queue = true" > whenever I enqueue one; check for that flag before adding it again; > and remove the flag when I process it; > but was wondering if there's an easier/better/more conventional way. > > -- > You received this message because you are subscribed to the Google Groups > "beanstalk-talk" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<beanstalk-talk%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/beanstalk-talk?hl=en. > > -- You received this message because you are subscribed to the Google Groups "beanstalk-talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/beanstalk-talk?hl=en.
