> On 30 Aug 2016, at 4:17 PM, kusimari share <[email protected]> wrote:
> 
> Some additional data
> 
> Apache debug logs show:
> [mpm_worker:debug] [pid 6870:tid 140091454019328] worker.c(1829): AH00294: 
> Accept mutex: fcntl (default: sysvsem)
> This I assume implies: mpm worker uses fcntl while mod_wsgi uses sysvsem. 
> With this default /path-to-apache/var contained only .sock file.
> 
If using Apache 2.4, then maybe.

The logic in mod_wsgi is:

#if !defined(AP_ACCEPT_MUTEX_TYPE)
    sconfig->lock_mechanism = ap_accept_lock_mech;
#else
    sconfig->lock_mechanism = APR_LOCK_DEFAULT;
#endif

>From memory they got rid of ap_accept_lock_mech and is impossible to work out 
>what Apache used.

So mod_wsgi will use default. If the Apache configuration override default 
explicitly, then quite likely different.

As I said, it shouldn’t matter unless the locking mechanism is somehow broken 
or unreliable for that platform.
> I added two lines to apache conf
> 
> Mutex file:/path-to-apache/var/state mpm-accept
> 
> WSGIAcceptMutex flock
> 
So that will guarantee using the same, presuming that Apache was overriding it 
and flock is more reliable than sysvsem on your system.

I personally have never seen it, but have seen one person complain that sysvsem 
were unreliable in Apache because if Apache crashed, or had kill -9 done on it, 
the sysvsem wouldn’t be cleaned up properly and you would run out of them. Who 
does kill -9 on Apache and then restarts it and if Apache as a whole was 
crashing you would know about it.
> Now apache debug logs show:
> 
> [mpm_worker:debug] [pid 15647:tid 140091454117350] worker.c(1829): AH00294: 
> Accept mutex: flock (default: sysvsem)
> 
> And /path-to-apache/var contains both .sock and .lock file.
> 
> I still don't know if this will eliminate deadlocks happening in one box 
> while the other box chugs away happily without issues. On the box which has 
> issues, deadlock reason seems to vary. Saw these in the logs at different 
> times. Don't know why the reason for deadlock also changes. (Note this is 
> using fcntl default sysvsem)
> 
> Logs for the most common of the deadlock situations. Note that pid for the 
> timeout and deadlock timer are different.
> [wsgi:error] [pid 11561:tid 139804096313088] [client 10.4.71.118:28639] 
> Timeout when reading response headers from daemon process 'app': 
> /path-to-apache/bin/app.wsgi
> 
> [wsgi:info] [pid 11559:tid 139804272707328] mod_wsgi (pid=11559): Daemon 
> process deadlock timer expired, stopping process 'app'.
> 
> 

This means your Python code is deadlocking, not Apache or mod_wsgi code. This 
is usually caused by Python C extension modules that aren’t written properly so 
they work in sub interpreters.

Make sure that in using daemon mode you are delegating the WSGI application to 
run in the application group %{GLOBAL}.

What is the mod_wsgi setup you are using?

>From memory some of the numpy/scipy stuff has this exact problem and this is 
>required.

See:

    
http://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#python-simplified-gil-state-api
> [wsgi:info] [pid 11559:tid 139804333795072] mod_wsgi (pid=11559): Shutdown 
> requested 'app'.    
> 
> Logs for one of the deadlock situation:
> [wsgi:crit] [pid 4317:tid 140495687612160] (35)Resource deadlock avoided: 
> mod_wsgi (pid=4317): Couldn't acquire accept mutex 
> '/path-to-apache/var/state.14843.0.1.sock'. Shutting down daemon process
> One of the rare situation which I saw in passing, but did not grab the log 
> quickly enough, showed logging api failure to lock log file
> 
> 
> I am not sure what's at fault. If it was due to fcntl and sysvsem, then that 
> should have happened in both the boxes and not just one. The only two 
> difference between the two boxes are
> 
>  - One is in AWS US region and the other is in AWS EU region. Though uname 
> and httpd -V shows both being the same.
> 
>  - The US region loads different set of scikit-learn pickles than the EU 
> region one. But both pickles are exports from the same scikit-learn classes 
> (SGDClassifier)
> 
> 
> 
> Mysterious at best. Any clues would help. Definitely don't want to try using 
> gunicorn/nginx just for experimenting if something in apache is messing 
> things up.
> 
> 
> 
> Thanks
> 
> Santhosh
> 
> 
> 
> 
> On Monday, 29 August 2016 19:23:02 UTC-7, kusimari share wrote:
> Ok, I will try that and post back.
> 
> Difficulty has been due to the fact that it happens only in one instance and 
> not in the other. That is on the box which is deployed on linux in AWS US, 
> never see any deadlocks even though the number of requests are higher. 
> 
> However in the box deployed on linux in AWS EU, no matter the number of 
> process and threads, the deadlock always happens. Further, if the number of 
> processes and threads are increased the deadlock happens frequently. As of 
> now I have restricted to 2 process and 15 threads for WSGI. So the system 
> chugs along fine for 3-8 hours and then goes into a tail spin of 
> deadlock/restart for about 3 or 4 times and then it continues again for 3-8 
> hours.
> 
> In the AWS US box, it never deadlocks and goes on for more than a couple week.
> 
> Is there a way query the .sock file? Something similar to cat /proc/locks?
> 
> Santhosh
> 
> On Monday, 29 August 2016 18:21:55 UTC-7, Graham Dumpleton wrote:
> 
>> On 30 Aug 2016, at 10:17 AM, kusimari share <[email protected] <>> wrote:
>> 
>> Hi,
>> 
>> I have two instances of a apache, mod_wsgi and flask based http rest 
>> service. The only difference between the two services are the files being 
>> loaded (python pickle files on flask start) and AWS regions, one is in US 
>> while the other is in EU. The flask service itself is a wrapper over scikit 
>> learn predictions which are the reason for the python pickle files mentioned 
>> earlier.
>> 
>> Confusingly the instance in US works fine. No issues. However the EU 
>> instance keeps getting into a deadlock situation because of which mod_wsgi 
>> restarts. Apache is still up. I have had no luck using the different 
>> mod_wsgi debug mechanisms, apparently there is no deadlock in the flask, 
>> scikit learn side of the equation.
>> 
>> Today, found that on deadlock mod_wsgi emitted:
>> 21720 [Mon Aug 29 21:31:26.979825 2016] [wsgi:crit] [pid 4317:tid 
>> 140495687612160] (35)Resource deadlock avoided: │EOE
>> 
>>       mod_wsgi (pid=4317): Couldn't acquire accept mutex 
>> '/path-to-apache/var/state.14843.0.1.sock'. Shutting down daemon process 
>> 
>> 
>> Apache conf contains
>> Mutex file:/path-to-pache/var/state mpm-accept
>> 
>> My suspicion is that one of the instance is using fcntl while the other uses 
>> flock, and probably the one using fcntl has deadlocks based on apache 
>> documentation. But I have no way to verify it.
>> 
>> Is there a way to find which of fcntl or flock is being used by Apache?
>> 
>> httpd -V shows
>>  -D APR_HAS_SENDFILE
>>  -D APR_HAS_MMAP
>>  -D APR_USE_SYSVSEM_SERIALIZE
>>  -D APR_USE_PTHREAD_SERIALIZE
>>  -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
>>  -D APR_HAS_OTHER_CHILD
>>  -D AP_HAVE_RELIABLE_PIPED_LOGS
>>  -D DYNAMIC_MODULE_LIMIT=256
> 
> If you start up the Apache web server with LogLevel set to ‘debug’, you 
> should see a line like:
> 
>     [mpm_prefork:debug] [pid 82122] prefork.c(1027): AH00165: Accept mutex: 
> none (default: flock)
> 
> I think this is what mod_wsgi will also use. There was an issue at some point 
> where Apache stopped allowing me to see the choice it made and so I had to 
> start calculating it myself.
> 
> That in itself shouldn’t cause a problem though as Apache’s own cross process 
> mutex locks are separate to those mod_wsgi creates, so there cannot be a 
> clash if types end up being different.
> 
> I would suggest the issue may be an ownership/permissions issue on the 
> directory rather than a mismatch of lock types. The locks on this are all 
> managed in mod_wsgi so can’t see how the type of locking used could differ at 
> different times.
> 
> So what are the ownership/permissions on the directory:
> 
>     /path-to-apache/var/
> 
> Can you try setting the directive:
> 
>     WSGISocketPrefix /tmp
> 
> as a test to see if that helps. The sockets and lock files should then be 
> placed in /tmp instead, which if this is a single user machine should be fine.
> 
> You can also override what the mutex type used by mod_wsgi  for its own locks 
> is by using:
> 
>     WSGIAcceptMutex flock
> 
> Graham
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at https://groups.google.com/group/modwsgi 
> <https://groups.google.com/group/modwsgi>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to