These are great points. Thanks Graham!!! I did run some experiments and I do have a database lock in place (get for UPDATE in Mysql seems to act as a lock in pymysql connector), so as you note requests could pile up. However, the subprocess is apparently not invoked in my main wsgi python program (with the api.add_resource type statements) until a successful get-unique-key from MySql works. So I don't think I will pile up a mass of subprocesses, at least.
I just noticed my code does not presently check the return status properly after the subprocess completes. So yes, I would need to be doing that. Good point, check exit status. Could you please recommend some reading for how to properly configure a queuing system? Mike On Monday, August 8, 2022 at 10:15:44 PM UTC-7 Graham Dumpleton wrote: > Just be mindful of what will happen if a database operation takes a long > time and holds some sort of lock. More requests may come into the web > application, and if every one of these is creating a sub process, but then > get stuck waiting for the first, then you could spike out memory usage for > the system as a whole. > > This is the benefit of using a task queuing system as it can queue up > requests and give you a point of control for how many can run concurrently. > > Also ensure that you are waiting on the sub processes if necessary and > getting back any exit status. If you don't do this they can become zombie > processes, which although dead, still can consume memory in kernel process > table. So not being mindful of that and letting the number of zombie > processes grow indefinitely is not a good idea. > > Anyway, just look out for issues like that. > > Graham > > On 9 Aug 2022, at 3:09 pm, [email protected] <[email protected]> wrote: > > > In Python... It's just reading from a database a little, minor updates, > then some read-only models for AI, no network I/O. When I ran experiments > it fired up and used the pipes fine, no problems I could see, and I ran two > calls concurrently. > > Thanks Graham! > On Monday, August 8, 2022 at 8:11:17 PM UTC-7 Graham Dumpleton wrote: > >> Using subprocess module alone may work okay, really depends on what it is >> doing. For simple stuff it is probably okay, but danger is where the sub >> process being run has strange requirements around signals because of what >> it inherits from the Apache parent process by way of the signal mask. This >> for example causes certain Java applications to not work properly when >> executed via subprocess module out of mod_wsgi process as something about >> Java garbage collection (from memory), requires setting its own signal >> handlers, but they are blocked and so never execute and so Java gets stuck. >> >> So you would really just need to try and see. For more complicated stuff, >> you would be better off delegating stuff to a backend task management >> system such as Celery. >> >> Graham >> >> On 9 Aug 2022, at 1:04 pm, [email protected] <[email protected]> wrote: >> >> Hi, >> >> I'm trying to speed up my python program using multiprocessing since some >> of it can be concurrent. >> >> I am using Rocky Linux, Apache, mod_wsgi. I've been using this setup for >> years and no problem, but no multiprocessing... >> >> What I have been doing all along is to invoke my program from the main >> wsgi-flask script as such: >> >> Result = subprocess.run([python3 MainPgm.py], >> stdin=subprocess.PIPE, >> stdout=subprocess.PIPE) >> stdout_data = result.stdout >> >> So I'm using the subprocess. >> >> My question is: is it safe to add multiprocessing inside my "MainPgm"? >> My tests today sure worked fine, but I notice that this is frowned upon, >> but I noticed: >> >> "If you really want to pursue this, then suggest you move this code >> outside of the WSGI script file and put it in a standard module on the >> Python module search path you have set up for application." >> >> ^^ which seems to indicate it might work. >> >> Thanks. >> >> On Monday, May 2, 2011 at 4:55:38 PM UTC-7 Graham Dumpleton wrote: >> >>> Using the multiprocessing module within mod_wsgi is a really bad idea. >>> This is because it is an embedded system where Apache and mod_wsgi >>> manage processes. Once you start using multiprocessing module which >>> tries to do its own process management, then it could potentially >>> interfere with the operation of Apache/mod_wsgi in unexpected ways. >>> >>> For example, taking your example and changing it not to be dependent >>> on web.py I get: >>> >>> import multiprocessing >>> import os >>> >>> def x(y): >>> print os.getpid(), 'x', y >>> return y >>> >>> def application(environ, start_response): >>> status = '200 OK' >>> output = 'Hello World!' >>> >>> response_headers = [('Content-type', 'text/plain'), >>> ('Content-Length', str(len(output)))] >>> start_response(status, response_headers) >>> >>> print 'create pool' >>> pool = multiprocessing.Pool(processes=1) >>> print 'map call' >>> result = pool.map(x, [1]) >>> print os.getpid(), 'doit', result >>> >>> return [output] >>> >>> If I fire off a request to this it appears to work correctly, >>> returning me hello world string and log the appropriate messages. >>> >>> [Tue May 03 09:40:36 2011] [info] [client 127.0.0.1] mod_wsgi >>> (pid=32752, process='hello-1', >>> application='hello-1.example.com|/mptest.wsgi'): Loading WSGI script >>> '/Library/WebServer/Sites/hello-1/htdocs/mptest.wsgi'. >>> [Tue May 03 09:40:36 2011] [error] create pool >>> [Tue May 03 09:40:36 2011] [error] map call >>> [Tue May 03 09:40:36 2011] [error] 32753 x 1 >>> [Tue May 03 09:40:36 2011] [error] 32752 doit [1] >>> >>> However, the process then appears to receive a signal from somewhere >>> causing it to shutdown: >>> >>> [Tue May 03 09:40:36 2011] [info] mod_wsgi (pid=32752): Shutdown >>> requested 'hello-1'. >>> [Tue May 03 09:40:41 2011] [info] mod_wsgi (pid=32752): Aborting >>> process 'hello-1'. >>> >>> The multiprocessing module does issue signals, so it may be the source >>> of this. >>> >>> One thought was that this may be occurring when the pool is destroyed >>> at the end of the function call, so I moved the creation of pool to >>> module scope. >>> >>> import multiprocessing >>> import os >>> >>> print 'create pool' >>> pool = multiprocessing.Pool(processes=1) >>> >>> def x(y): >>> print os.getpid(), 'x', y >>> return y >>> >>> def application(environ, start_response): >>> status = '200 OK' >>> output = 'Hello World!' >>> >>> response_headers = [('Content-type', 'text/plain'), >>> ('Content-Length', str(len(output)))] >>> start_response(status, response_headers) >>> >>> print 'map call' >>> result = pool.map(x, [1]) >>> print os.getpid(), 'doit', result >>> >>> return [output] >>> >>> This though will not even run: >>> >>> [Tue May 03 09:47:31 2011] [info] [client 127.0.0.1] mod_wsgi >>> (pid=32893, process='hello-1', >>> application='hello-1.example.com|/mptest.wsgi'): Loading WSGI script >>> '/Library/WebServer/Sites/hello-1/htdocs/mptest.wsgi'. >>> [Tue May 03 09:47:31 2011] [error] create pool >>> [Tue May 03 09:47:31 2011] [error] map call >>> [Tue May 03 09:47:31 2011] [error] Process PoolWorker-1: >>> [Tue May 03 09:47:31 2011] [error] Traceback (most recent call last): >>> [Tue May 03 09:47:31 2011] [error] File >>> >>> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", >>> line 231, in _bootstrap >>> [Tue May 03 09:47:31 2011] [error] self.run() >>> [Tue May 03 09:47:31 2011] [error] File >>> >>> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py", >>> line 88, in run >>> [Tue May 03 09:47:31 2011] [error] self._target(*self._args, >>> **self._kwargs) >>> [Tue May 03 09:47:31 2011] [error] File >>> >>> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", >>> line 57, in worker >>> [Tue May 03 09:47:31 2011] [error] task = get() >>> [Tue May 03 09:47:31 2011] [error] File >>> >>> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/queues.py", >>> line 339, in get >>> [Tue May 03 09:47:31 2011] [error] return recv() >>> [Tue May 03 09:47:31 2011] [error] AttributeError: 'module' object has >>> no attribute 'x' >>> >>> The browser also then hangs at that point. >>> >>> Part of the issue here may be that WSGI script files are not really >>> standard Python modules in that the basename of the WSGI script file >>> doesn't match a module in sys.modules. If the multiprocessing module >>> tries to do magic stuff with imports to find original code to execute >>> in sub process it isn't going to work. >>> >>> Specifically, may be related to: >>> >>> http://code.google.com/p/modwsgi/wiki/IssuesWithPickleModule >>> >>> If I attempt to move x() into being a nested function as: >>> >>> import multiprocessing >>> import os >>> >>> print 'create pool' >>> pool = multiprocessing.Pool(processes=1) >>> >>> def application(environ, start_response): >>> status = '200 OK' >>> output = 'Hello World!' >>> >>> response_headers = [('Content-type', 'text/plain'), >>> ('Content-Length', str(len(output)))] >>> start_response(status, response_headers) >>> >>> def x(y): >>> print os.getpid(), 'x', y >>> return y >>> >>> print 'map call' >>> result = pool.map(x, [1]) >>> print os.getpid(), 'doit', result >>> >>> return [output] >>> >>> Then one does get pickle errors, albeit for a different reason: >>> >>> [Tue May 03 09:52:59 2011] [info] [client 127.0.0.1] mod_wsgi >>> (pid=33010, process='hello-1', >>> application='hello-1.example.com|/mptest.wsgi'): Loading WSGI script >>> '/Library/WebServer/Sites/hello-1/htdocs/mptest.wsgi'. >>> [Tue May 03 09:52:59 2011] [error] create pool >>> [Tue May 03 09:52:59 2011] [error] map call >>> [Tue May 03 09:52:59 2011] [error] Exception in thread Thread-1: >>> [Tue May 03 09:52:59 2011] [error] Traceback (most recent call last): >>> [Tue May 03 09:52:59 2011] [error] File >>> >>> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", >>> line 522, in __bootstrap_inner >>> [Tue May 03 09:52:59 2011] [error] self.run() >>> [Tue May 03 09:52:59 2011] [error] File >>> >>> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", >>> line 477, in run >>> [Tue May 03 09:52:59 2011] [error] self.__target(*self.__args, >>> **self.__kwargs) >>> [Tue May 03 09:52:59 2011] [error] File >>> >>> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", >>> line 225, in _handle_tasks >>> [Tue May 03 09:52:59 2011] [error] put(task) >>> [Tue May 03 09:52:59 2011] [error] PicklingError: Can't pickle <type >>> 'function'>: attribute lookup __builtin__.function failed >>> >>> So, it is doing pickling in some form, which isn't going to work for >>> stuff in WSGI script file. >>> >>> If you really want to pursue this, then suggest you move this code >>> outside of the WSGI script file and put it in a standard module on the >>> Python module search path you have set up for application. >>> >>> Overall though, I would recommend against using multiprocessing module >>> from inside of mod_wsgi. >>> >>> Graham >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On 2 May 2011 23:37, Ed Summers <[email protected]> wrote: >>> > Hi all, >>> > >>> > I asked this over on web-sig [1] earlier today, but am asking here >>> > since it looks to only mod_wsgi related... >>> > >>> > I've been trying to use the multiprocessing [2] w/ mod_wsgi and have >>> > noticed what appears to be deadlocking behavior with body django and >>> > web.py. I created a minimal example with web.py to demonstrate [3]. >>> > >>> > If you have mod_wsgi and web.py available, and and put something like >>> > this in your apache config: >>> > >>> > WSGIScriptAlias /multiprocessing /home/ed/wsgi_multiprocessing.py >>> > AddType text/html .py >>> > >>> > then visit: >>> > >>> > http://localhost/ >>> > >>> > and compare with: >>> > >>> > http://localhost/?multiprocessing=1 >>> > >>> > you should see the second URL hang. >>> > >>> > Going forward I'm most likely going to move this functionality to an >>> > asynchronous queue (celery, etc) but I was wondering if >>> > multiprocessing + mod_wsgi was generally known to be something to >>> > avoid, or if it was even forbidden somehow. >>> > >>> > Any assistance you can provide would be welcome. >>> > >>> > //Ed >>> > >>> > >>> > [1] http://mail.python.org/pipermail/web-sig/2011-May/005065.html >>> > [2] http://docs.python.org/library/multiprocessing.html >>> > [3] https://gist.github.com/951570 >>> > >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "modwsgi" group. >>> > To post to this group, send email to [email protected]. >>> > To unsubscribe from this group, send email to modwsgi+u...@ >>> googlegroups.com. >>> > For more options, visit this group at >>> http://groups.google.com/group/modwsgi?hl=en. >>> > >>> > >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/modwsgi/7be84885-54d4-4417-adb3-42f1a0122a54n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/modwsgi/7be84885-54d4-4417-adb3-42f1a0122a54n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/modwsgi/79fda9dd-a8d7-4c5a-a5ec-794074e4cf14n%40googlegroups.com > > <https://groups.google.com/d/msgid/modwsgi/79fda9dd-a8d7-4c5a-a5ec-794074e4cf14n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/d9d9077c-53bf-45e8-a9e4-948a355ac2e1n%40googlegroups.com.
