Since the launch of the new infrastructure for PyPI two weeks ago, I’ve been 
monitoring overall performance and reliability of PyPI for browsers, uploads, 
installers, and mirrors.

Overall I am very happy, but have noticed an ongoing issue with latency spikes 
and 5xx errors. I believe these issues are not new, but we don’t have any of 
the logs or monitoring that came along with the new infrastructure.

The cause of these issues is very apparently mirroring clients hitting PyPI 
with floods of requests at common cron intervals. Additionally, new mirrors 
coming online and performing their initial sync can easily cause extended 
periods of increased latency and errors for all users, especially if the number 
of workers configured to perform the sync is turned up.

At 2014-02-07 at about 00:00 UTC PyPI was effectively DoS’d for 45 minutes 
while a major research lab performing a sync via bandersnatch. It appears their 
worker count may have been configured as high as 50.

The design of PEP 381 mirroring clients requires calls to the PyPI XMLRPC to 
obtain changelogs and package serial numbers. As such, when clients are 
configured for high parallelism our backends can be quickly overwhelmed.

In order to maintain quality of service for all clients, we will begin rate 
limiting requests to the following routes:

  - /pypi
  - /mirrors
  - /id
  - /oauth
  - /security

The initial rates will be limited to 5 req/s per IP with bursts of 10 requests 
allowed. Client requests up to the burst limit will be delayed to maintain a 5 
req/s maximum. Any requests past the 10 request burst will receive an HTTP 429 
response code per RFC 6585.

Tuning these parameters will be painless, so if issues arise with mirroring 
clients we will be very responsive to necessary modifications.

Note that the routes used by installation clients (`/packages` and `/simple`) 
will remain unaffected as they are generally served from the CDN, and do not 
have as high of an overhead in our backend processes.

This rate-limiting is to be considered an interim solution, as I plan to begin 
a discussion on some updates to mirroring infrastructure guidelines.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Distutils-SIG maillist  -  [email protected]
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to