Guys and Girls

I've been tasked with tackling a problem at work that would be a great use
of backgroundrb, but I wanted to make sure I did this in the most scalable
way possible, so natuarally I thought I'd ask the list.

The problem isn't that difficult. We need to have a backgroundrb task that
picks up emails from an inbox on a regular basis. At any one time the number
of emails in the inbox could be very different, sometimes there are none,
sometimes there are thousands. We need to do some processing on each of
those emails and there are two different types. Email Type A needs some
processing which is very time intensive, Email Type B is quicker to deal
with and mostly just needs to be saved to the database. However, there
should always be more of Type B than of Type A. The processing isn't human
driven and would run from the built-in cron scheduling.

I've thought of a couple of ways to handle this so it'll scale well and
wanted to run them by the group.

1. Have two workers that run separately, one to process Type A emails, the
other to process Type B. Both of the workers would be scheduled using the
built-in cron scheduler.
I'm not sure I like this approach because we'd have to process some emails
twice to find out what they were. I also suspect that there may be other
issues here with the two workers trying to access the same emails at the
some time (although I'm not 100% about this)

2. Have one worker that just processes each email sequentially. This might
take some time and I'm not sure of the consequences of the worker not
finishing before the next cycle starts (let's say they get run every 30
seconds). I suspect that if it takes 50 seconds to run they eventually start
to back up on each other. Perhaps there's a different way of handling the
scheduling I'm not aware of.

3. Have one worker that processes the emails and then starts a new thread
for another method in the same worker. Something like

If email.type_a
  thread_pool.defer(:process_type_a, email)
else
  thread_pool.defer(:process_type_b, email)
end

However, I'm not sure what the consequences are here. If we have thousands
of emails to process and the processing of the types takes longer than the
time it takes to recognise the type of email will we eventually run out of
threads? Are there other consequences I'm unaware of? Are these threads
async?


4. Have three workers, the first which would process the emails and work out
what type they were. The other two would process their respective types and
would be run by the first one adding to the persistent queue. For example,
the main worker would do something like

If email.type_a
  MiddleMan(:typea_worker).enq_process_email(:arg => email.body)
else
  MiddleMan(:typeb_worker).enq_process_email(:arg => email.body)
end

Once again, having never done this before I'm not sure what the consequences
are. Are there problems with running backgroundrb workers inside other
workers? Is there an async way to handle this?

I'm sure there are also situations I'm missing so I'm completely open to
other suggestions. Any help would be greatly appreciated.

Dale Cook
_______________________________________________
Backgroundrb-devel mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/backgroundrb-devel

Reply via email to