Elizabeth Mattijsen wrote:
At 11:18 AM -0500 1/24/05, Eric Garland wrote:

I'm setting up a Boss/Worker threaded program that runs for a very long time. There are times where a worker thread will run into errors and exit. The obvios solution would be to prevent that from happening but I would like to create a fault tolerant framework that doesn't tip over at the slightest hint of a problem. I would like to have the Boss detect when a worker exits and restart a worker thread in its place.

So far, I find no functions that will determine if a thread has exited outside of join(). The obvious issue with join is that it blocks so I need a join thread for each worker thread so that I can send a message back to the Boss to restart the thread. This seems to wildly increase the memory usage. There are large shared data structures in this program and it already seems to be at the memory limits of the machine without these join threads.


You might want to have a look at Thread::Running on CPAN, by yours truly.

Interesting approach. I found an issue in that an immediate call to running after starting a thread returns false ala:


my $thread = threads->new( \&worker_thread );
while ($thread->running) {
    select(undef,undef,undef,0.1);
    print "Still running\n";
}

will fall right through the while loop. Adding a yield or a pause above the while loop seems to fix the problem.

Looking through the code, the trick you use seems to be running the entire thread in an eval so that the exit can be trapped by a wrapper subroutine. That's great! I think I'll just use that directly.

Is there a way to remove the shared variables from the join threads so that they take up as little memory as possible?


Not sure what you mean by that.

I don't really understand much about shared variable memory usage in ithreads but it seems that when you have a shared data structure, it takes up memory for each thread. I'm dealing with a large (100,000 entry) multi dimensional array where each element is approximately 20 elements and several of them are arrays themselves. The process tends to use up about 400 MB of ram when running (not all perl) and I'm testing on a machine with 512mb of RAM. Adding the watcher threads slowed the machine to a crawl. It may have just been the overhead of the interpreters that did that since I'm so close to using all my ram. I assumed that if I could remove the large shared data structure from the watcher threads, it would make them more memory efficient. Is that true?


If so, is it possible?

Thanks Liz.

Reply via email to