Greg Smith wrote:
I think a helpful next step here would be to put Robert's fsync compaction patch into here and see if that helps. There are enough backend syncs showing up in the difficult workloads (scale>=1000, clients >=32) that its impact should be obvious.

Initial tests show everything expected from this change and more. This took me a while to isolate because of issues where the filesystem involved degraded over time, giving a heavy bias toward a faster first test run, before anything was fragmented. I just had to do a whole new mkfs on the database/xlog disks when switching between test sets in order to eliminate that.

At a scale of 500, I see the following average behavior:

Clients TPS backend-fsync
16 557 155
32 587 572
64 628 843
128 621 1442
256 632 2504

On one run through with the fsync compaction patch applied this turned into:

Clients TPS backend-fsync
16 637 0
32 621 0
64 721 0
128 716 0
256 841 0

So not only are all the backend fsyncs gone, there is a very clear TPS improvement too. The change in results at >=64 clients are well above the usual noise threshold in these tests. The problem where individual fsync calls during checkpoints can take a long time is not appreciably better. But I think this will greatly reduce the odds of running into the truly dysfunctional breakdown, where checkpoint and backend fsync calls compete with one another, that caused the worst-case situation kicking off this whole line of research here.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to