Mladen Turk
Mon, 21 Jul 2008 23:43:20 -0700
Bojan Smojver wrote:
On Mon, 2008-07-21 at 22:05 +1000, Bojan Smojver wrote:For example...After thinking about it a bit more, it appears that we really should not need to fiddle with locking here. Essentially, when we call this, we can assume that root pool and below will not be modified by another thread (because we can always control what root pool is), so we can just go ahead and search in peace. Comments?
In the original pool_destroy the pool->sibling access is guarded by mutex. Since in multithreaded environment child pool might be in the middle of destroy process detaching himself, think you'll have to guard the access to pool->sibling in destroy_safe as well Also there is a problem if the root pool gets destroyed in which case you'll be accessing zombie memory, so I don't think this will help. As an example I'll give you Tomcat Native APR connector. It gets loaded in JVM as an module( .jar + native libraries), so the module actually uses APR, not the application. The presumption one controls the application and apr_initialize/apr_terminate doesn't stand any more, cause module can be loaded/unloaded many times during the application lifetime. Since apr_terminate (just an edge case example) can happen at user choice inside different thread, and there is multiple threads in the middle of the blocking APR call (accepting socket connections for example), after the native call breaks one cannot be sure that both local pool and global (root) pool will be valid. The problem as I see it, requires some event mechanism, because the callback is actually a 'message post' to another thread, causing the callback result to actually execute effectively at some future time, and due to busyness and thread context switching this can lead to nasty sporadic cores that we observe nowadays. The reason is because in one thread apr first destroys child pools in one quick loop and then go immediately to another loop that calls the callbacks. If the system is very busy this can cause nasty sync issues because the function like accept can break (caused by pool destroy) after you call the registered callback, and since those are executed in the context of another thread you really have no idea what's going on :) Perhaps we'll need some sort of event mechanism for callbacks that would cause waiting before going to another callback in the loop or something like that. Guarding that externally would make things basically single threaded and that would be performance killer. I've got close to solving the issues by having the atomic counter for each long native function call, causing the apr_pool_destroy to wait for all native calls to exit, but that's a nightmare to maintain and write the user code. Regards -- ^(TM)