I'm setting up a Boss/Worker threaded program that runs for a very long time. There are times where a worker thread will run into errors and exit. The obvios solution would be to prevent that from happening but I would like to create a fault tolerant framework that doesn't tip over at the slightest hint of a problem. I would like to have the Boss detect when a worker exits and restart a worker thread in its place.

So far, I find no functions that will determine if a thread has exited outside of join(). The obvious issue with join is that it blocks so I need a join thread for each worker thread so that I can send a message back to the Boss to restart the thread. This seems to wildly increase the memory usage. There are large shared data structures in this program and it already seems to be at the memory limits of the machine without these join threads.

Is there a way to remove the shared variables from the join threads so that they take up as little memory as possible?

Better yet, is there a way to detect thread deaths outside of a dedicated join thread, perhaps a $thread->ready_for_join() type function or a $thread->join_non_blocking() or even a $thread->join_any_of_these(@threads)?

Thanks

-Eric

Reply via email to