On 05/22/2013 11:27 AM, Tom Hughes wrote:
On 22/05/13 18:20, Kai Krueger wrote:

On 05/22/2013 11:04 AM, Tom Hughes wrote:

So if a new child is started, and multiple requests arrive more or
less simultaneously to different threads in that process, then they
will both try and allocate the stores array which means they will both
be trying to manipulate the memory pool at the same time.
>
The apache routines to manipulate memory pools should be thread safe, so
that part should be fine.

That's not what the interwebs are telling me - do you have some documentation for that claim? Only I'm finding quotes like "Pools are explicitly thread unsafe".

Ouch. Looks like you are right. It sais functions like apr_pool_create are thread-safe, but those are only the ones to create new pools, not the general functions.


So that needs fixing and probably the rest of mod_tile checked to see if those functions are used incorrectly anywhere else.


I guess the upside is, that that possibly means we have found the cause and can relatively easily fix it and don't have to go on a long debuging hunt. Thanks.

That's why you have things like per-request pools - so that you can do allocations in request context without locking overheads as well as so you can clean up easily.

It does look like it is possible that multiple processes can allocate a
new storage array simultaneously, but that should "only" lead to memory
leak, rather than crashes. In that race, simply one of the threads wins
and gets to set the stores array and the other allocated arrays go
unused. As all allocations are equivalent, it shouldn't matter which wins.

That race should be fixable, by simply adding an explicit lock after the
stores==null check. As this only happens at process / thread
initialisation and all operations are fast, the performance impact of
that should be negligible.

Can the stores array not just be allocated in the child_init hook?
I can't remember the details, but I believe child_init hook did not do at all what I wanted. I think it might have again only been per process and not per thread.

That way it is only the apr_pool_userdata_get call that you are relying on to be thread safe - no idea if it is, but at it is reading things it is more likely to be.
With the explicit mutex after the stores == null (and the appropriate recheck), apr_pool_userdata_get would also be the only function that needs to be thread safe. However, if that is not, then you would have to put a mutex around that as well. As that is called on every request that would be less nice having to do that. On the other hand, given the load we see on typical mod_tile installations, that shouldn't be an issue either. From the benchmarks I did on the locking on the stats collection, even at 10k tiles/s the per request locking didn't seem to have a significant effect.

I'll try and fix this tonight and hopefully that will then indeed solve the instability issues Sven and Andy have seen.

Kai


Tom


_______________________________________________
dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/dev

Reply via email to