On Wednesday, 5 February 2014 at 17:24:46 UTC, Ola Fosheim Grøstad wrote:
The access would be easy and something like (probably not 100% correct):

counter_addr = (ptr&~0xffff) + ( (ptr>>12)&0xfffc )

It was of course wrong, that would make the smallest allocation unit 16KiB. Anyway, if tuned to the indexed loads of the CPU then it would not be all the slow. On the x86 you should be able to do something like (pseudo):

uint64 reg1 = ptr & 0xfff....f0000
uint32 reg2 = ptr >> 8
uint64 reg3 = load_effective_address( reg1 + 4*reg2 )
increment( *reg3 )

So only 4-5 cheap instructions for single threaded counting.

You could also use the most significant bit for bookkeeping of single-threaded vs multi-threaded ref counting:

   test (*reg3)
   if (positive) goto nolock:
   lockprefix
nolock:
   increment( *reg3 )

Reply via email to