Thanks Graham, I'll do as you suggest and see what we get.
I suspect I have some sort of issue with large directories. As a workaround I've been breaking the directories down into 4000 images at a time and the performance is acceptable. So, while image processing may not be a great idea, it is working well for me provided I don't have huge directories. I had one as large as 10,000 that also ran fine, but 30,000+ was a total bust with performance rapidly going from 2 images per second to 7 seconds per image. With 4000 images in a directory I get consistent performance of 1-2 images per second. Not sure if that helps narrow down the investigation, but thought I'd mention it. Gary On Tuesday, December 8, 2020 at 4:02:37 PM UTC-8 Graham Dumpleton wrote: > Performing a lot of image processing in the web application processes is > generally a bad idea. Usually you would do this using a backend task > queuing system like Celery. > > The main reason it is a bad idea is that Python does not perform very well > when you have multiple threads and there are heavily CPU bound. This is > because the Python global interpreter lock will result in Python > application code effectively being serialised even though you have multiple > threads. So for CPU bound work, multithreading is a convenience, but not a > performant solution. > > First thing I would do is confirm how many threads you actually have > running in the process and what they are doing, when the process slows > down. For this you can employ the code at: > > > https://modwsgi.readthedocs.io/en/master/user-guides/debugging-techniques.html#extracting-python-stack-traces > > You will need to update the example code to Python 3 as is still Python 2. > > With that code in place, you can trigger a dump of what all the threads > are up to. > > Look for threads being stuck in code which may not be performant in > handling large directory listings or general image processing. > > Also look for more threads than expected, perhaps because where you are > starting worker threads is getting executed more than once for some reason. > > Graham > > On 9 Dec 2020, at 3:17 am, Gary Conley <ga...@goldeneraproductions.org> > wrote: > > Hi Graham, > > Thanks for the reply. > > All processing is done within the web application processes, if I > understand your question correctly. > > This is my first web app, I've done all my prior development as PySide > desktop applications and actually migrated this app from a desktop app. > > From the upload.html template the user sees a lists of directories that > can be uploaded. These are all in a central "in box" directory which is > updated every 10 seconds. The user selects directories and clicks "Upload > Selected" which sends a request containing the selected paths to a > '_launch_upload' route in the Flask app.py. > > In my Flask app.py the '_launch_upload' route adds the selected paths to a > Python queue. I then start a new python Thread targeting a _start_queue > method. _start_queue in turn takes each path and instantiates a "loader" > class which subclasses Thread and then calls the run() method on the loader > object, which performs the actual upload of that directory. > > The loader object puts the path for each image into a queue (self.q) from > which they are processed in parallel up to 24 at a time (user > configurable). This is done using a queue/worker configuration as in: > > for _ in range(threadcount): > t = Thread(target=self.worker) > t.daemon = True > t.start() > self.q.join() > > The worker method is where all the image processing and upload to the DAM > occurs. > > The workers take an image off the self.q queue and process them until the > queue is empty. > > All image processing is done using subprocess.run calling the appropriate > app. > > When self.q.join() returns the first directory has been fully processed > and the run method of the loader object returns. This returns control to > the start_upload method in app.py which calls run on the next loader object > and so on until all directories are processed. > > I may have implied that the app works on a watch folder basis but this is > not the case. It is entirely based on the user selection, for various > reasons. > > I also realized I failed to mention that we are running in wsgi daemon > mode. > > As a final note, there is an abort_upload route which can access the > loader objects and call a stop_upload method on the object which changes > the value of a keep_loading flag to false, which stops all processing on > the loader object. It also empties the queue of any unprocessed > directories. For some reason this method also stopped working. If I call it > within the first few seconds of calling launch_upload it works fine, but if > I let it go for a bit it no longer has any effect. This had been working > well, but has since stopped and yet I didn't change anything in this part > of the code. It seems to be related to this other problem with the slow > uploads, but I can't be 100% certain. > > I hope that is all clear. > > Let me know if any logs or code would be helpful. > > I really appreciate your help. I'm somewhat new to web development as I > said, but learning fast! > > Best, > > Gary > On Monday, December 7, 2020 at 10:48:03 PM UTC-8 Graham Dumpleton wrote: > >> Is the image processing being done within the web application processes, >> or in a separate set of processes which operate only based on seeing what >> is stuck in the upload directory used to queue up images? >> >> Just trying to understand better how the work is broken up. >> >> On 8 Dec 2020, at 9:29 am, Gary Conley <ga...@goldeneraproductions.org> >> wrote: >> >> I have a flask app running with mod_wsgi version 4.7.1 on Centos7 >> (httpd). Python 3.6. >> >> The server is brand new, 64GB RAM, Dual Xeon 3Ghz, all SSD drives, 8gbs >> fiber to a Stornext SAN. All very fast hardware. >> >> I have a problem where the app tends to slow down dramatically under >> certain circumstances and have had no success to date in finding the cause. >> >> The app itself is fairly simple. It processes images from a directory and >> uploads them to a DAM (php based - under NGINX) running on another server. >> The processing consists of generating preview and thumbnail images from the >> original images (mostly raw file formats - NEF, CR2 etc) using imageMagick, >> ufraw and exiftool, extracting xmp data using exiftool and uploading these >> elements to the DAM through the DAM's API. >> >> The processing is multithreaded using python queues and workers, nothing >> fancy. >> >> The app gathers data about the images from a mysql database and persists >> transactional data to mysql for workflow management purposes. >> >> There is a simple html template with controls for selecting which >> directories to upload and provide the user with feedback on progress, >> errors and so on, using ajax calls. The user can also abort the upload >> process. >> >> Under normal circumstances the app will process 2 images per second. >> There have been instances however where the app slows way down, taking 7-10 >> seconds to upload 1 image, a factor of 15 to 20 times slower. >> >> I have run metrics on the various steps of the upload procedure and it >> appears that every aspect of the app slows down. Generating a preview >> image, which normally takes less than a second takes 40 seconds, extracting >> metadata with exiftool, typically less than half a second takes 7-10 >> seconds. Database response seems to remain constant. Upload to the DAM also >> takes much longer. >> >> When the slow down occurs requests from the browser to the app time out. >> Aborting the upload procedure becomes impossible and the only way to stop >> it is to stop Apache (httpd) directly on the server. >> >> We have checked CPU usage (less than 10%), memory usage (less than 8 GB >> on a 64GB machine) and also checked IO, confirming that we were able read >> and write at up to 9.5gbs while the app was running at a snail's pace. >> >> The only thing we have been able to isolate as having any effect on the >> upload speed is the size of the upload queue. When a user selects a >> directory to be uploaded the files in that directory (and all >> sub-directories) are sorted by size and put into a python queue from which >> they are then uploaded with between 10 and 20 threads, which is user >> configurable. We have tested with queues up to 10,000 files with no issue >> at all. We had a slow down with a queue that was over 35,000 images. >> >> The content of the images makes little to no difference in speed. >> Processing huge image files (such as 5GB PSB files) does slow down the >> upload process, but only to 1.5 seconds per file. The last instance of slow >> down occurred on 5MB jpgs. The speed on 60MB NEF and 200KB jpgs is >> virtually the same under normal circumstances. >> >> We suspect a memory issue, but don't see any increase in memory usage >> using htop, top or glances. >> >> We restarted httpd in the hopes it would clear up any memory leak, with >> no improvement. We even rebooted the machine with similar hopes, again with >> no improvement. >> >> We tried change the number of threads in our wsgi config (from 5 to 15) >> and also changed the number of processes to 2. We even tried setting >> threads to 1, which had disastrous effects. None of this made any >> improvement. We put our settings back to 1 process and 5 threads. >> >> Any clues on what could be causing this slow down or ideas on how to >> isolate what is causing it would be much appreciated. We've spent days >> trying to track it down with the only solution being to break up our jobs >> into smaller chunks, which is very non-optimum for our workflow as it has >> to be done manually due to the nature of the content. >> >> We can send files if needed, but are not sure what to send. >> >> Thank you. >> >> Gary >> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to modwsgi+u...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/modwsgi/389a39bd-2ad1-4c65-a784-0762029d0825n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/modwsgi/389a39bd-2ad1-4c65-a784-0762029d0825n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to modwsgi+u...@googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/modwsgi/29f12479-ad2d-4eba-9dde-b893d010cc41n%40googlegroups.com > > <https://groups.google.com/d/msgid/modwsgi/29f12479-ad2d-4eba-9dde-b893d010cc41n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/bed0387a-920e-4753-8738-26c8f037e749n%40googlegroups.com.