Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR

Jan Vlach Tue, 24 Nov 2015 09:24:00 -0800

Hi Steven,

I don't want to jump the gun, but after ~3 hours of heavy network i/o (ping -f 
to and from, cvs checkout, ftp ... the stuff that crashed the box previously), 
it is stable.


Thank you very much!

I will try to torture the box some more to see if it would behave ...

Jan


On Tue, Nov 24, 2015 at 03:31:53AM +0000, Steven Chamberlain wrote:
> Hi!
> 
> Would anyone like to try this change?  It's early to say if this
> definitely fixed the issue for me, but it looks promising:
> 
> --- sys/kern/subr_pool.c
> +++ sys/kern/subr_pool.c
> @@ -259,5 +259,5 @@ pool_init(struct pool *pp, size_t size, 
>       if (pgsize - (size * items) > sizeof(struct pool_item_header)) {
>               off = pgsize - sizeof(struct pool_item_header);
> -     } else if (sizeof(struct pool_item_header) * 2 >= size) {
> +     } else if (sizeof(struct pool_item_header) * 8 >= size) {
>               off = pgsize - sizeof(struct pool_item_header);
>               items = off / size;
> 
> Prior to v1.149, there was a threshold of I think PAGE_SIZE/16=512
> on sparc64;  pools for an item size greater than that would use an in-
> page header:
> 
>        * Decide whether to put the page header off page to avoid       
>        * wasting too large a part of the page. Off-page page headers   
>        * go into an RB tree, so we can match a returned item with      
>        * its header based on the page address.         
>        * We use 1/16 of the page size as the threshold (XXX: tune)     
>        */      
>       if (pp->pr_size < palloc->pa_pagesz/16 && pp->pr_size < PAGE_SIZE) {    
>  
>               /* Use the end of the page for the page header */
> 
> In v1.149 the threshold became sizeof(struct pool_item_header)*2=224 on
> sparc64, so dma256 and dma512 pools would no longer use an in-page
> header, but be able to accommodate more items per page as a result.
> 
> The adjustment above simply reverts that behavioural change.  It
> probably never should have broken anything, other than slight
> performance change, but it seems like it triggered some maybe pre-
> existing bug elsewhere.
> 
> I've already ruled out the unsigned int arithmetic I've mentioned thus
> far, with KASSERT()s that didn't trigger even when the crash happens.
> 
> And I've already tried to rule out cache colouring by forcing
> pp->pr_maxcolors=0 to no avail.  (Since it was only used in pools
> with an in-page header, it could have been related).
> 
> p.s. I would maybe even test if this helps with tmpfs issues seen on
> armv7 and such, as I think that was first mentioned around the time of
> this change, and since it uses pool(9) for its file metadata.
> 
> Regards,
> -- 
> Steven Chamberlain
> [email protected]

-- 
Be the change you want to see in the world.

Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR

Reply via email to