Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

Bruce Momjian Tue, 23 Dec 2014 19:52:24 -0800

On Thu, Apr 17, 2014 at 11:23:24AM +0200, Andres Freund wrote:
> On 2014-04-16 19:18:02 -0400, Bruce Momjian wrote:
> > On Thu, Feb  6, 2014 at 09:40:32AM +0100, Andres Freund wrote:
> > > On 2014-02-05 12:36:42 -0500, Robert Haas wrote:
> > > > >> It may well be that your proposal is spot on.  But I'd like to see 
> > > > >> some
> > > > >> data-structure-by-data-structure measurements, rather than assuming 
> > > > >> that
> > > > >> alignment must be a good thing.
> > > > >
> > > > > I am fine with just aligning BufferDescriptors properly. That has
> > > > > clearly shown massive improvements.
> > > > 
> > > > I thought your previous idea of increasing BUFFERALIGN to 64 bytes had
> > > > a lot to recommend it.
> > > 
> > > Good.
> > > 
> > > I wonder if we shouldn't move that bit of logic:
> > >   if (size >= BUFSIZ)
> > >           newStart = BUFFERALIGN(newStart);
> > > out of ShmemAlloc() and instead have a ShmemAllocAligned() and
> > > ShmemInitStructAligned() that does it. So we can sensibly can control it
> > > per struct.
> > > 
> > > > But that doesn't mean it doesn't need testing.
> > > 
> > > I feel the need here, to say that I never said it doesn't need testing
> > > and never thought it didn't...
> > 
> > Where are we on this?
> 
> It needs somebody with time to evaluate possible performance regressions
> - I personally won't have time to look into this in detail before pgcon.


I am doing performance testing to try to complete this item.  I used the
first attached patch to report which structures are 64-byte aligned:

        64-byte shared memory alignment of Control File:  0
        64-byte shared memory alignment of XLOG Ctl:  1
        64-byte shared memory alignment of CLOG Ctl:  0
        64-byte shared memory alignment of CommitTs Ctl:  0
        64-byte shared memory alignment of CommitTs shared:  0
        64-byte shared memory alignment of SUBTRANS Ctl:  1
        64-byte shared memory alignment of MultiXactOffset Ctl:  1
        64-byte shared memory alignment of MultiXactMember Ctl:  1
        64-byte shared memory alignment of Shared MultiXact State:  1
        64-byte shared memory alignment of Buffer Descriptors:  1
        64-byte shared memory alignment of Buffer Blocks:  1
        64-byte shared memory alignment of Shared Buffer Lookup Table:  1
        64-byte shared memory alignment of Buffer Strategy Status:  1
        64-byte shared memory alignment of LOCK hash:  0
        64-byte shared memory alignment of PROCLOCK hash:  0
        64-byte shared memory alignment of Fast Path Strong Relation Lock Data: 
 0
        64-byte shared memory alignment of PREDICATELOCKTARGET hash:  0
        64-byte shared memory alignment of PREDICATELOCK hash:  0
        64-byte shared memory alignment of PredXactList:  0
        64-byte shared memory alignment of SERIALIZABLEXID hash:  1
        64-byte shared memory alignment of RWConflictPool:  1
        64-byte shared memory alignment of FinishedSerializableTransactions:  0
        64-byte shared memory alignment of OldSerXid SLRU Ctl:  1
        64-byte shared memory alignment of OldSerXidControlData:  1
        64-byte shared memory alignment of Proc Header:  0
        64-byte shared memory alignment of Proc Array:  0
        64-byte shared memory alignment of Backend Status Array:  0
        64-byte shared memory alignment of Backend Application Name Buffer:  0
        64-byte shared memory alignment of Backend Client Host Name Buffer:  0
        64-byte shared memory alignment of Backend Activity Buffer:  0
        64-byte shared memory alignment of Prepared Transaction Table:  0
        64-byte shared memory alignment of Background Worker Data:  0
        64-byte shared memory alignment of shmInvalBuffer:  1
        64-byte shared memory alignment of PMSignalState:  0
        64-byte shared memory alignment of ProcSignalSlots:  0
        64-byte shared memory alignment of Checkpointer Data:  0
        64-byte shared memory alignment of AutoVacuum Data:  0
        64-byte shared memory alignment of Wal Sender Ctl:  0
        64-byte shared memory alignment of Wal Receiver Ctl:  0
        64-byte shared memory alignment of BTree Vacuum State:  0
        64-byte shared memory alignment of Sync Scan Locations List:  0
        64-byte shared memory alignment of Async Queue Control:  0
        64-byte shared memory alignment of Async Ctl:  0

Many of these are 64-byte aligned, including Buffer Descriptors.  I
tested pgbench with these commands:

        $ pgbench -i -s 95 pgbench
        $ pgbench -S -c 95 -j 95 -t 100000 pgbench

on a 16-core Xeon server and got 84k tps.  I then applied another patch,
attached, which causes all the structures to be non-64-byte aligned, but
got the same tps number.

Can someone test these patches on an AMD CPU and see if you see a
difference?  Thanks.

-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
new file mode 100644
index 2ea2216..25b9eba
*** a/src/backend/storage/ipc/shmem.c
--- b/src/backend/storage/ipc/shmem.c
*************** ShmemInitStruct(const char *name, Size s
*** 413,418 ****
--- 413,419 ----
  							" \"%s\" (%zu bytes requested)",
  							name, size)));
  		}
+ 		fprintf(stderr, "64-byte shared memory alignment of %s:  %d\n", name, ((int64)structPtr % 64) == 0);
  		result->size = size;
  		result->location = structPtr;
  	}

diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
new file mode 100644
index 2ea2216..cc1ac1f
*** a/src/backend/storage/ipc/shmem.c
--- b/src/backend/storage/ipc/shmem.c
*************** ShmemInitStruct(const char *name, Size s
*** 327,332 ****
--- 327,335 ----
  	ShmemIndexEnt *result;
  	void	   *structPtr;
  
+ //	if (strcmp(name, "Buffer Descriptors") == 0)
+ 		size += 32;
+ 
  	LWLockAcquire(ShmemIndexLock, LW_EXCLUSIVE);
  
  	if (!ShmemIndex)
*************** ShmemInitStruct(const char *name, Size s
*** 413,418 ****
--- 416,424 ----
  							" \"%s\" (%zu bytes requested)",
  							name, size)));
  		}
+ //		if (strcmp(name, "Buffer Descriptors") == 0)
+ 			structPtr = (void *)((int64)structPtr + 4);
+ 		fprintf(stderr, "64-byte shared memory alignment of %s:  %d\n", name, ((int64)structPtr % 64) == 0);
  		result->size = size;
  		result->location = structPtr;
  	}

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

Reply via email to