[Mailman-Developers] Hyperkitty and Django Q

Leon Merten Lohse Tue, 20 Mar 2018 10:37:59 -0700

Startled by the observation that Hyperkitty's unit tests to fail becausean excessive amount of file descriptors are opened, I began to dig a little.

The majority of these file descriptors are opened by "Django Q" -- adjango library for asynchronous tasks.

I wondered what these asynchronous tasks might be in an e-mail archiver.After all, it has to perform 2 very specific jobs:

1. Receive e-mails via http requests, process them, and store them in adatabase.

2. Display the collected e-mails.

On top of that, it generates some statistics, includes a voting system,and allows replying to e-mails in a forum-like fashion.

Job 1 really requires very little processing and is mostly database IOas the message has to be sorted and its position in the thread has to becomputed.None of this can be efficiently performed in thebackground/asynchronously because there is nothing to be parallelized.Performing the *whole* receive-and-store operation non-blocking also isnot an option -- ultimately mailman needs to know if it succeeded.

Job 2 is more or less what every web application does (reading from adatabase and rendering the result) and certainly does not require anyasynchronous processing.

So where is Django Q used in Hyperkitty? It all boils down to the filehyperkitty/tasks.py where a smaller number of asynchronous tasks aredefined.

They can be grouped into 3 classes:
- query mailman core
- repair the data structure (empty threads, orphans, ...)
- rebuild query cache

I would like to argue that none of these 3 groups of tasks need to beperformed asynchronously.- Mailman-core only needs to be queried when mailing lists are changed.This is triggered by signals from postorius and in addition periodicallyby a cron job.- The data structure should not need to be repaired in the first placebut the appropriate on_delete/on_save triggers should take care of this-- and I believe they do in recent versions. If for some reason thedatabase becomes corrupted one can always start a repair operation.Nothing is gained however, from running this asynchronously.- Lastly the cache rebuild: Currently Hyperkitty rebuilds its cache(which caches the db queries, not the frontend) whenever an e-mail isreceived. Since it only involves *reading* from the db it actually issomething that *can* be done asynchronously to reduce the time it takesto process an incoming e-mail.

But is it really worth the tremendous additional complexity that isintroduced by Django Q?


- requires a "qcluster" to run in the background (see shipped unit file)
- loss of determinism / debugging becomes much harder
- enourmous amount of file descriptors are opened in testing
- additional dependency

A similar result can be obtained by simply scaling the wsgi applicationaccordingly (if needed) and/or optimizing the db queries.Alternatively one could simply invalidate the affected caches instead ofrebuild them every time an e-mail is received or don't trigger cacherebuilds on received e-mails at all...

But maybe I overlooked something. I argue that we do not *need* DjangoQ. The question is: do we want it?

What are your thoughts on this?

Best
Leon
_______________________________________________
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

[Mailman-Developers] Hyperkitty and Django Q

Reply via email to