#23517: Collect static files in parallel -------------------------------------+------------------------------------- Reporter: thenewguy | Owner: nobody Type: Uncategorized | Status: closed Component: contrib.staticfiles | Version: 1.7 Severity: Normal | Resolution: needsinfo Keywords: | Triage Stage: Has patch: 0 | Unreviewed Needs tests: 0 | Needs documentation: 0 Easy pickings: 0 | Patch needs improvement: 0 | UI/UX: 0 -------------------------------------+-------------------------------------
Comment (by thenewguy): Just wanted to post back on this. I was able to write a quick 20 line proof of concept for this using the threading module. The speedup was pretty significant so I figured I would reopen this again. I could be wrong, but I imagine something like this would be beneficial to the general django userbase. Granted, I don't know if others get as restless as I do while waiting on static files to upload. I've quickly tested collectstatic with 957 static files. All files are post processed in some fashion (at least being hashed by ManifestFilesMixin) and also a gzipped file is created if the saved file benefits from gzip compression. The storage backend stored the files on AWS S3. The AWS S3 console listed 3254 files were deleted when I deleted the files after each test. So in total, 3254 files were created during collectstatic per case. The following times are generated by the command line and should not be interpreted as quality benchmarks... but they are good enough to show the significance. {{{ set startTime=%time% python manage.py collectstatic --noinput echo Start Time: %startTime% echo Finish Time: %time% }}} Times (keep in mind staticfiles collectstatic does not output the count for gzipped files, so there are roughly 957*2 more files than it reports) {{{ 957 static files copied, 957 post-processed. async using 100 threads (ParallelUploadStaticS3Storage) Start Time: 16:43:57.01 Finish Time: 16:49:30.31 Duration: 5.55500 minutes sync using regular s3 storage (StaticS3Storage) Start Time: 16:19:24.21 Finish Time: 16:41:46.78 Duration: 22.3761667 minutes }}} This storage is derived from ManifestFilesMixin and a subclass of S3BotoStorage (django-storages) that creates gzipped copies and checks for file changes to keep reliable modification dates before saving: {{{ class ParallelUploadStaticS3Storage(StaticS3Storage): """ THIS STORAGE ASSUMES THAT UPLOADS ONLY OCCUR FROM CALLS TO THE COLLECTSTATIC MANAGEMENT COMMAND. SAVING TO THIS STORAGE DIRECTLY IS NOT RECOMMENDED BECAUSE THE UPLOAD THREADS ARE NOT JOINED UNTIL POST_PROCESS IS CALLED. """ active_uploads = [] thread_count = 100 def remove_completed_uploads(self): for i, thread in reversed(list(enumerate(self.active_uploads))): if not thread.is_alive(): del self.active_uploads[i] def _save_content(self, key, content, **kwargs): while self.thread_count < len(self.active_uploads): self.remove_completed_uploads() # copy the file to memory for the moment to get around file closed errors -- BAD HACK FIXME FIX content = ContentFile(content.read(), name=content.name) f = super(ParallelUploadStaticS3Storage, self)._save_content thread = threading.Thread(target=f, args=(key, content), kwargs=kwargs) self.active_uploads.append(thread) thread.start() def post_process(self, *args, **kwargs): # perform post processing for post_processed in super(ParallelUploadStaticS3Storage, self).post_process(*args, **kwargs): yield post_processed # wait for the remaining uploads to finish print "Post processing completed. Now waiting for the remaining uploads to finish." for thread in self.active_uploads: thread.join() }}} -- Ticket URL: <https://code.djangoproject.com/ticket/23517#comment:2> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To post to this group, send email to django-updates@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/067.ed7c869ce244d15f56241a7b520efcd7%40djangoproject.com. For more options, visit https://groups.google.com/d/optout.