Re: [modwsgi] Process killed kills my cache

Julien Delafontaine Tue, 19 Apr 2016 06:01:33 -0700

Sure. I have a million-by-thousand boolean matrix (numpy) that takes a few 
MB of my RAM, which is fine to me. Now when I try to fit it into, or 
retreive from a Redis cache, it requires to transform it into a string, 
which takes more than 10 seconds either way (ndarray.tobytes).


With the in-memory version, the user has the answer in less than a second, 
as expected in a reactive web service. The problem with that (and with all 
available Django caches...) is that the cached object belongs to its 
process, so if I have N Apache processes, I need to generate N copies of 
that cache, and it can time out, and it gets killed together with the 
process.

All I want is an in-memory cache that is independent from Apache processes, 
and I can't believe I have to build it myself. This is no more a mod_wsgi 
problem, though.


Le mardi 19 avril 2016 14:43:14 UTC+2, Jason Garber a écrit :
>
> Hi Julian, 
>
> This conversation points to some improvements that could be made in the 
> data structures. It is hard to picture what you are doing that your efforts 
> would not be better rewarded by fitting your problem cleanly into Redis.  
> Can you shed any light on specifics of your data structures?
>
> Thanks!
> Jason
> On Apr 19, 2016 8:37 AM, "Julien Delafontaine" <[email protected] 
> <javascript:>> wrote:
>
>> One problem is that the app needs to be fully loaded, i.e. models etc. I 
>> know that in the latest versions of Django there is a hook 
>> <https://docs.djangoproject.com/en/dev/ref/applications/#django.apps.AppConfig.ready>
>>  
>> that allows to run things after everything is loaded, but it did not work 
>> as well as expected in practice. I'll try with a few seconds delay.
>>
>> I see what you mean with the mini cache server. If I don't find a simpler 
>> way, I'll try something like this because it does exactly what I need: a 
>> kind of Memcached without serialization.
>>
>>
>> Le mardi 19 avril 2016 12:48:18 UTC+2, Graham Dumpleton a écrit :
>>>
>>> One can always fire off the creation of the cache as a side affect of 
>>> the WSGI script file being loaded. You can even do it in a background 
>>> thread while still handling requests. So initial requests may be slow as 
>>> cache populates, but once loaded should be good.
>>>
>>> See a problem with doing it that way?
>>>
>>> On 19 Apr 2016, at 8:45 PM, Julien Delafontaine <[email protected]> 
>>> wrote:
>>>
>>> When a process is started, I pull blobs out of a database, put their 
>>> data in a matrix, and keep the matrix in memory (because [de-]serialization 
>>> for usual caches is slow) so that computations using that matrix are very 
>>> quick. The construction of the matrix takes time, though, and can time out 
>>> if the database is big. 
>>>
>>> What I do is I trigger the first call to that controller myself (with a 
>>> curl) so that users don't see it later and only have the quick responses. 
>>> But if the cache gets erased, it becomes slow for them as well.
>>> I have a --request-timeout set to 90s, but apparently it is still not 
>>> enough.
>>>
>>> An improvement would be to store the computed matrix in a persistent 
>>> cache, and load that only when the app starts (takes double the amount of 
>>> memory and still a dozen seconds to deserialize from my experience).
>>>
>>>
>>> Le mardi 19 avril 2016 11:33:36 UTC+2, Graham Dumpleton a écrit :
>>>>
>>>> Can you explain more about what the long running requests are doing?
>>>>
>>>> The timeout can be extended by using option like:
>>>>
>>>>     —request-timeout=300
>>>>
>>>> Would help to understand the need for long running requests and can 
>>>> perhaps suggest a better way.
>>>>
>>>> Graham
>>>>
>>>> On 19 Apr 2016, at 7:30 PM, Julien Delafontaine <[email protected]> 
>>>> wrote:
>>>>
>>>> I have long requests on purpose. In the same process I am building 
>>>> cache for several elements, it can take time (at startup only), and for 
>>>> one 
>>>> item it occasionally times out. So it would fit the scenario where when 
>>>> this one times out, the process is restarted and all the previously 
>>>> computed data is lost... This is extremely annoying :( Time to set up 
>>>> persistent cache, maybe.
>>>>
>>>> Thanks a lot !
>>>>
>>>> Le mardi 19 avril 2016 11:15:42 UTC+2, Graham Dumpleton a écrit :
>>>>>
>>>>> When using mod_wsgi-express it does run daemon mode. So with that 
>>>>> configuration you should have two persistent processes. The processes 
>>>>> should not be recycled under normal circumstances.
>>>>>
>>>>> The only way with the default configuration that processes could be 
>>>>> recycled is if you have stuck requests and eventually trip the request 
>>>>> timeout. For a multi thread process the process restart would kick in 
>>>>> only 
>>>>> when the length of all active requests (across total number of request 
>>>>> slots) averaged 60 seconds.
>>>>>
>>>>> So if you had one stuck request only, if it was stuck for 5 minutes, 
>>>>> then finally process would be forcibly restarted. If has two stuck 
>>>>> requests 
>>>>> that started at same time, would restart after 2.5 minutes. If five stuck 
>>>>> requests in same process, then after 60 seconds. It is a weird 
>>>>> calculation 
>>>>> but only thing that makes half sense in multi threaded application.
>>>>>
>>>>> To work out whether forced process restarts are occurring because of 
>>>>> the timeout, add the:
>>>>>
>>>>>     —log-level info
>>>>>
>>>>> option. With this mod_wsgi will log more details about process 
>>>>> restarts and why they were triggered. You can then look in the logs to 
>>>>> confirm if this is what is happening.
>>>>>
>>>>> Do you know if you are seeing requests that never seem to finish? Or 
>>>>> does your application run with very long requests on purpose?
>>>>>
>>>>> Graham
>>>>>
>>>>> On 19 Apr 2016, at 6:34 PM, Julien Delafontaine <[email protected]> 
>>>>> wrote:
>>>>>
>>>>> I am using mod_wsgi-express:
>>>>>
>>>>>     mod_wsgi-express setup-server ${baseDir}/project/wsgi.py 
>>>>> --port=8887 --user myapp --server-root=${remoteDir}/mod_wsgi-server 
>>>>> --processes 2 --threads 5;
>>>>>
>>>>> Then
>>>>>
>>>>>     ${remoteDir}/mod_wsgi-server/apachectl restart
>>>>>
>>>>> This sets up the configuration itself, it seems. I thought 
>>>>> mod_wsgi-express would run daemon mode by default?
>>>>>
>>>>>
>>>>> Le mardi 19 avril 2016 10:19:01 UTC+2, Graham Dumpleton a écrit :
>>>>>>
>>>>>> Sounds like you are using embedded mode rather than daemon mode. In 
>>>>>> embedded mode Apache will recycle processes.
>>>>>>
>>>>>> How do you have it configured? Are you using 
>>>>>> WSGIDaemonProcess/WSGIProcessGroup directives at all?
>>>>>>
>>>>>> Graham
>>>>>>
>>>>>> On 19 Apr 2016, at 6:12 PM, Julien Delafontaine <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a multi-processes mod_wsgi application that stores some cache 
>>>>>> data in memory. Each process naturally gets its own instance of that 
>>>>>> cache. 
>>>>>> Now it seems that processes after some time get 
>>>>>> killed/restarted/whatever, 
>>>>>> so that the cache has to be reinitialized everytime this happens. How 
>>>>>> can I 
>>>>>> control it ?
>>>>>>
>>>>>> Ideally I'd like to start 2 Apache/mod_wsgi processes, initialize the 
>>>>>> cache on each, and let the app run forever without needing to recompute 
>>>>>> the 
>>>>>> cache. Is that possible?
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "modwsgi" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at https://groups.google.com/group/modwsgi.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>>
>>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "modwsgi" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/modwsgi.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "modwsgi" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/modwsgi.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] Process killed kills my cache

Reply via email to