So, I recently got a new desktop which I'm using to kick the tires on parallel testing for Launchpad.
I've a little bit of a story about it, but nothing specific to performance in the web app. OTOH it may illustrate how tricky performance is ;) I've had a bit of a fun time with its disks - its got plain old spinning platters rather than SSDs :). I got it with 2 disks, and I salvaged 2 more from my old desktop, and put them into a raid 1+0 (which is a stripe set built out of two mirror-sets : can tolerate a single disk failure, any one write goes to 2 disks, any read can be serviced from either of two disks. It turns out the dm-raid1 driver doesn't do load balancing reads, so that last point is more theoretical than practical in natty. I did some itch scratching on this - whipping up a patch to the dm-raid1 driver to load balance. My first attempt to do read load balancing was a complete success: each successive read request I sent to a different disk, and iostat clearly showed me reading from both. It was also massivley slower at sequential IO: whereas cat largefile > /dev/null was 100MB/s on natty's released kernel, I was lucky to get 25MB/s from each of the two drives - a loss of approximately 50% performance. Now, the dm- layers work by mapping requests: a request comes in, and may get split into two (e.g. if it crosses a stripe segment). Raid 0 and raid 1 have no parity requirements so can just map any read request into one or more read requests from the backing device. So you get an IO chain: actual request on dm device -> one or more requests on backing device, which are submitted back in the top of the kernel IO stack this means that reqest *merging* can happen on the requests to the backing device. This is important because something simple like 'cat bigfile > /dev/null' will trigger the kernel readahead behaviour, and that generates up to 1000 separate IO requests per second - running ahead of the actual reads cat is performing. If each little IO request was serviced separately, total performance would be very slow - we'd be limited by the command depth of the IDE disk (e.g. 31 tagged commands), and if the requests are small enough, this won't cover a cylinder, so we run into rotational latency etc. So to pick up the story, my patch changed 1000 requests/sec which were being merged into 50 requests/sec, into 500 requests/sec per drive, which were not merged at all, and only 30 could be issued to each drive at once, so while the rate at which IO requests were satisfied was approximately the same, the amount of work done per second plummeted. I've changed the patch to track where the last submitted request was per backing device, and preferentially choose the closest one - this retains the prior behaviour for sequential IO but starts load balancing random reads quite tolerably. Our Lazr.restful API which exposes lots of little methods is really very similar to this situation, but without the merging concept: we get lots of tiny requests, which would be more efficient in batch mode. Now, where this all gets interesting is when you consider queue servicing: if we do 100MB/s of IO for a 1/2 the time vs 50MB/s, then we can service more things in a given time span - as long as we don't try to do more concurrent work. If we try to more concurrent work, the efficiency drops and everything just gets slower (which is what was happening to our python appserver processes until our recent reconfiguration). Speaking of queuing, I've recently put a lock around our test suite's creation of temporary databases: when we create a new database in the test suite it takes 0.2seconds *optimally* just in postgresql. Once we have 5 worker threads making new databases, they can starve each other out if the tests are fairly fast. Whats worse, is that if two users try to create a database at the same time, postgresql will fail - because the template db can only have one user in it when CREATE DATABASE is called. And the failure is *slow* - on the order of seconds. This massively increases the contention if we don't have a lock around calling CREATE DATABASE, and so things snowball. Anyhow, sorry i didn't have anything directly LP specific this week, but I hope this little ramble was interesting. It was interesting to see the same basics turning up in the kernel performance for me anyhow ;) -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp