Guys and Girls I've been tasked with tackling a problem at work that would be a great use of backgroundrb, but I wanted to make sure I did this in the most scalable way possible, so natuarally I thought I'd ask the list.
The problem isn't that difficult. We need to have a backgroundrb task that picks up emails from an inbox on a regular basis. At any one time the number of emails in the inbox could be very different, sometimes there are none, sometimes there are thousands. We need to do some processing on each of those emails and there are two different types. Email Type A needs some processing which is very time intensive, Email Type B is quicker to deal with and mostly just needs to be saved to the database. However, there should always be more of Type B than of Type A. The processing isn't human driven and would run from the built-in cron scheduling. I've thought of a couple of ways to handle this so it'll scale well and wanted to run them by the group. 1. Have two workers that run separately, one to process Type A emails, the other to process Type B. Both of the workers would be scheduled using the built-in cron scheduler. I'm not sure I like this approach because we'd have to process some emails twice to find out what they were. I also suspect that there may be other issues here with the two workers trying to access the same emails at the some time (although I'm not 100% about this) 2. Have one worker that just processes each email sequentially. This might take some time and I'm not sure of the consequences of the worker not finishing before the next cycle starts (let's say they get run every 30 seconds). I suspect that if it takes 50 seconds to run they eventually start to back up on each other. Perhaps there's a different way of handling the scheduling I'm not aware of. 3. Have one worker that processes the emails and then starts a new thread for another method in the same worker. Something like If email.type_a thread_pool.defer(:process_type_a, email) else thread_pool.defer(:process_type_b, email) end However, I'm not sure what the consequences are here. If we have thousands of emails to process and the processing of the types takes longer than the time it takes to recognise the type of email will we eventually run out of threads? Are there other consequences I'm unaware of? Are these threads async? 4. Have three workers, the first which would process the emails and work out what type they were. The other two would process their respective types and would be run by the first one adding to the persistent queue. For example, the main worker would do something like If email.type_a MiddleMan(:typea_worker).enq_process_email(:arg => email.body) else MiddleMan(:typeb_worker).enq_process_email(:arg => email.body) end Once again, having never done this before I'm not sure what the consequences are. Are there problems with running backgroundrb workers inside other workers? Is there an async way to handle this? I'm sure there are also situations I'm missing so I'm completely open to other suggestions. Any help would be greatly appreciated. Dale Cook
_______________________________________________ Backgroundrb-devel mailing list [email protected] http://rubyforge.org/mailman/listinfo/backgroundrb-devel
