Hi All,
This morning, I set out to improve the performance of "mailman import21"
command. If you have used it in the past, you will know that it is slow. Until
now, I never had an idea about why? Here were my ideas:
- Too many database calls and sqlite3 being the usual self
Although, I forgot that it is slow irrespective of the database
backend. Maybe we are doing way too many queries?
- Too many string comparisons
We all know string comparisons are slow, but how slow could they be?
- Something wasteful being done over and over again.
Here is a rough estimate of the time it takes to import mailman2.1's config.pck
for two lists:
151 members: 58 seconds
1429 members: 9 minutes
This is quote slow, 9 minutes is a lot. So, I set out to do the usual python
profiling using the standard library `cProfile` module and only wrapped it
around `mailman.utilities.importers._import_roster`. That method is the slowest
one since if you have run the the command, you know it takes the maximum amount
of time importing the list of members.
Without even looking at the entire output, the problem was apparent and none of
the ones that I guessed before:
ncalls tottime percall cumtime percall filename:lineno(function)
10.0090.009 50.692 50.692
/home/maxking/Documents/mm3/core/src/mailman/utilities/importer.py:600(_import_roster)
1510.0010.000 45.6910.303
/home/maxking/Documents/mm3/core/src/mailman/utilities/passwords.py:35(encrypt)
90% of the time is spent trying to encrypt user passwords, for each of the
imported member. Well, duh, encryption is an expensive operation and when you
do that once per-imported member, it is definitely going to be slow.
Mailman 3 uses passlib[1] for crypto and so I set out to figure out if there is
a hashing algorithm which can do this much faster and perhaps has a C library
wrapper that we can use to speed things up. I settled on argon2 cipher with a
supporting library argon2_cffi. Then I changed the config and tried the imports
again:
151 members: 15.884 seconds
1429 memebrs: 2minutes 29 seconds
That was a significant improvement over the previous numbers.
Although, another interesting fact is the user passwords are kind of useless in
Mailman 3. In Mailman 2 you had to setup a password or one was auto-generated
for you per-list and you needed that to login to the web ui. However, in
Mailman 3, the passwords (in Core's database) aren't used for logging in since
Web Frontend stores the authentication tokens (social auth or passwords). In
fact, the users who sign up first time on Mailman 3 probably don't ever have a
password set in Mailman Core's database.
So, I commented out the code that actually imports the
password(src/mailman/utilities/importer.py#L663-664) and the import speed
improved even more, obviously:
151 members: 4 seconds
1429 members: 57 seconds
I am hoping that I can commit the change with the commented out code, unless I
am reminded of a use for the passwords in Core's database. Then, it might be a
bit more of work trying to figure out another way to improve the speed.
Thanks for reading up!
[1]:
https://passlib.readthedocs.io/en/stable/narr/quickstart.html#making-a-decision
--
thanks,
Abhilash Raj (maxking)
___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3
Security Policy: https://wiki.list.org/x/QIA9