Over the last few weeks, I did a number of changes to the PyPI installation, namely - replace Apache with nginx - replace FastCGI with uwsgi - full vacuum of postgres, and activate of autovacuum - introduce a separate uwsgi logging daemon
Together, these changes seem to have a positive effect on stability of PyPI. Peak load average is down from 500 to about 10: http://pypi.python.org/munin/localdomain/localhost.localdomain/load.html I believe this is mainly due to switching from Apache to nginx. Apache would spawn hundreds of worker threads in an overload situation, which made things worse, not better. Memory consumption is down. Application memory would fluctuate up to 3.5G, and is now at 750M. Committed memory would increase up to 20G, and is now below 2G. Swap might did use up to 3G, and is now practically unused (7M). http://pypi.python.org/munin/localdomain/localhost.localdomain/memory.html Peak usage is again probably reduced to the change in process model between Apache and nginx; in addition, the rejuvenation features of uwsgi (replace worker process after 1000 requests) prevent Python processes from growing too much unused memory. Postgres response time is improved. There had been occasional transactions taking 1700s, and occasional queries taking 870s. This is now down to 45s/30s for the last day: http://pypi.python.org/munin/localdomain/localhost.localdomain/postgres_querylength_ALL.html There are two factors that likely cause this reduction. On the one hand, the postgres database wasn't vacuumed: http://pypi.python.org/munin/localdomain/localhost.localdomain/postgres_size_ALL.html The reason for the failure to autovacuum probably was that it was successively upgraded from 7.x release which didn't do autovacuum, and Debian at some point dropping the cron job that did the manual vacuum. Tables and indices now better fit into the address space, improving performance. Performing the full vacuum caused an outage of about 20 min two weeks ago. In addition, I set the uwsgi harakiri timeout to 60s, causing any query taking longer to be aborted. I believe such queries still occasionally happen; it's not clear to me what HTTP requests are triggering such long-running transactions. While I'm mostly happy with the current setup, one issue is that uwsgi doesn't support proper logrotation; in particular, it is unwilling to close-then-reopen the log files. Debian tries to use the copytruncate approach of logrotate, but that apparently didn't work too well (log space would constantly increase). I have now written a UDP server which supports proper log rotation and configured uwsgi to send log records to that UDP port. Regards, Martin _______________________________________________ Catalog-SIG mailing list [email protected] http://mail.python.org/mailman/listinfo/catalog-sig
