You should check out Sidekiq https://github.com/mperham/sidekiq. This library uses Redis (there is. Heroku adding for this) to queue up jobs, and leverages a pool of threads to work the queue. It is designed to optimize the use of workers.
In your model you could manage a queue of sites, and work that queue and peform the work as it is popped of the queue and retrieve and parse the page. For each link discovered another request could be queued. The process could be broken up further, but it's not efficient to have the content of a page within a queued request. The cool thing is this library can run about 14 threads per worker if configured with unicorn etc. way more cost efficient that one worker per thread... -- You received this message because you are subscribed to the Google Groups "Heroku" group. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/heroku?hl=en_US?hl=en
