Hi Simon,
> Thanks - I already did this for alloca/malloc, I'll add the others from > your patch. Thank you. > We go to quite a lot of trouble to avoid locking in the common cases and > fast paths - most of our data structures are CPU-local. Where in > particular have you encountered locking that could be reduced? > The pinned_object_block is CPU-local, usually no locking is required. > Only when the block is full do we have to get a new block from the block > allocator, and that requires a lock, but it's a rare case. OK, the code I have checked out from the repository contains this in "rts/sm/Storage.h": extern bdescr * pinned_object_block; And in "rts/sm/Storage.c": bdescr *pinned_object_block; My C might be rusty, but I see no way for pinned_object_block to be CPU local. If it is truly CPU local then what makes it to be that kind? As for locking, here is one one of examples: StgPtr allocatePinned( lnat n ) { StgPtr p; bdescr *bd = pinned_object_block; // If the request is for a large object, then allocate() // will give us a pinned object anyway. if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) { p = allocate(n); Bdescr(p)->flags |= BF_PINNED; return p; } ACQUIRE_SM_LOCK; // [RTVD: here we acquire the lock] TICK_ALLOC_HEAP_NOCTR(n); CCS_ALLOC(CCCS,n); // If we don't have a block of pinned objects yet, or the current // one isn't large enough to hold the new object, allocate a new one. if (bd == NULL || (bd->free + n) > (bd->start + BLOCK_SIZE_W)) { pinned_object_block = bd = allocBlock(); dbl_link_onto(bd, &g0s0->large_objects); g0s0->n_large_blocks++; bd->gen_no = 0; bd->step = g0s0; bd->flags = BF_PINNED | BF_LARGE; bd->free = bd->start; alloc_blocks++; } p = bd->free; bd->free += n; RELEASE_SM_LOCK; // [RTVD: here we release the lock] return p; } Of course, TICK_ALLOC_HEAP_NOCTR and CCS_ALLOC may require synchronization if they use shared state (which is, again, probably unnecessary). However, in case no profiling goes on and "pinned_object_block" is TSO-local, isn't it possible to remove locking completely from this code? The only case when locking will be necessary is when a fresh block has to be allocated, and that can be done within the "allocBlock" method (or, more precisely, by using "allocBlock_lock". ACQUIRE_SM_LOCK/RELEASE_SM_LOCK pair is present in other places too, but I have not analysed yet if it is really necessary there. For example, things like newCAF and newDynCAF are wrapped into it. With kind regards, Denys Rtveliashvili
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users