On 11/24/15 23:46, Fred wrote:
On 11/24/15 03:31, Steven Chamberlain wrote:
Hi!

Would anyone like to try this change?  It's early to say if this
definitely fixed the issue for me, but it looks promising:

--- sys/kern/subr_pool.c
+++ sys/kern/subr_pool.c
@@ -259,5 +259,5 @@ pool_init(struct pool *pp, size_t size,
       if (pgsize - (size * items) > sizeof(struct pool_item_header)) {
           off = pgsize - sizeof(struct pool_item_header);
-    } else if (sizeof(struct pool_item_header) * 2 >= size) {
+    } else if (sizeof(struct pool_item_header) * 8 >= size) {
           off = pgsize - sizeof(struct pool_item_header);
           items = off / size;

Prior to v1.149, there was a threshold of I think PAGE_SIZE/16=512
on sparc64;  pools for an item size greater than that would use an in-
page header:

     * Decide whether to put the page header off page to avoid
     * wasting too large a part of the page. Off-page page headers
     * go into an RB tree, so we can match a returned item with
     * its header based on the page address.
     * We use 1/16 of the page size as the threshold (XXX: tune)
     */
    if (pp->pr_size < palloc->pa_pagesz/16 && pp->pr_size < PAGE_SIZE) {
            /* Use the end of the page for the page header */

In v1.149 the threshold became sizeof(struct pool_item_header)*2=224 on
sparc64, so dma256 and dma512 pools would no longer use an in-page
header, but be able to accommodate more items per page as a result.

The adjustment above simply reverts that behavioural change.  It
probably never should have broken anything, other than slight
performance change, but it seems like it triggered some maybe pre-
existing bug elsewhere.

I've already ruled out the unsigned int arithmetic I've mentioned thus
far, with KASSERT()s that didn't trigger even when the crash happens.

And I've already tried to rule out cache colouring by forcing
pp->pr_maxcolors=0 to no avail.  (Since it was only used in pools
with an in-page header, it could have been related).

p.s. I would maybe even test if this helps with tmpfs issues seen on
armv7 and such, as I think that was first mentioned around the time of
this change, and since it uses pool(9) for its file metadata.

Regards,


Well with that diff I hit another panic - which seems to be triggered by the nic:

ddb> trace
data_access_error(e00173f8, 0, 1fe02010048, 84000000, 90d5fe0000, 0) at data_ac
cess_error+0x19c
trapbase_sun4v(40000556000, 0, 8, 1fe0000f068, 1fe0000f078, 8000003e00000000) a
t trapbase_sun4v+0x87a8
dc_mii_send(40000556000, 8000, 1, 188aec8, 1, 180e6a0) at dc_mii_send+0x20
dc_mii_writereg(40000556000, e0017848, 0, 4000801bc30, 0, 3b9ac800) at dc_mii_w
ritereg+0x94
dc_miibus_writereg(40000556000, 1, 1833670, 8000, 0, 8) at dc_miibus_writereg+0
x94
mii_phy_reset(40000436500, 1, 1, 64, 0, 3b9ac800) at mii_phy_reset+0x34
mii_phy_tick(40000436500, 10012a00000000, 305e7b209fe, faaf56200000000, bb6a, f
b7423403e94f590) at mii_phy_tick+0xb0
amphy_service(40000436500, 400005563c0, 1, 1800, 1893cf0, 188b000) at amphy_ser
vice+0x114
mii_tick(400005563c0, 40007fadde0, 11a4060, 188aec8, 40004928000, 1833670) at m
ii_tick+0x20
dc_tick(40000556000, 8000003e0fabce6a, 8, 1fe0000f068, 1fe0000f078, 8000003e000
00000) at dc_tick+0x1c8
softclock(1800, e, 18cecb8, 188aec8, 1, 180e6a0) at softclock+0x2d4
intr_handler(e0017ec8, 40000e36000, 94997c, 4000801bc30, 0, 3b9ac800) at intr_h
andler+0xc
sparc_interrupt(188f0c0, e0018000, 168f5c8, 188aec8, 0, 3b9ac800) at sparc_inte
rrupt+0x298
sched_idle(e0018000, 4000491c480, 1690170, 64, 0, 3b9ac800) at sched_idle+0x114

proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x4
ddb>
ddb> ps
   TID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
 21469  17760  17760   1000  3        0x83  select        ssh
 17760  27053  17760   1000  3    0x100003  biowait       cvs
 27053  22678  27053   1000  3    0x10008b  pause         ksh
 22678  32723  32723   1000  3        0x90  select        sshd
 32723  19761  32723      0  3        0x92  poll          sshd
 23880      1  23880      0  3    0x100083  ttyin         getty
 30610      1  30610      0  3    0x100098  poll          cron
 16745      1  16745     99  3        0x90  poll          sndiod
  5824  30082  30082     95  3    0x100090  kqread        smtpd
  8157  30082  30082     95  3    0x100090  kqread        smtpd
 13105  30082  30082     95  3    0x100090  kqread        smtpd
  9772  30082  30082     95  3    0x100090  kqread        smtpd
  9286  30082  30082     95  3    0x100090  kqread        smtpd
 28022  30082  30082    103  3    0x100090  kqread        smtpd
 30082      1  30082      0  3    0x100080  kqread        smtpd
 19761      1  19761      0  3        0x80  select        sshd
 32029  16805  16805     74  3    0x100090  bpf           pflogd
 16805      1  16805      0  3        0x80  netio         pflogd
 20237   3513   3513     73  2    0x100090                syslogd
  3513      1   3513      0  3    0x100080  netio         syslogd
 14265      1  14265     77  3        0x90  poll          dhclient
--db_more--
  6624      1   6624      0  3        0x80  poll          dhclient
 12749      0      0      0  3     0x14200  pgzero        zerothread
  7247      0      0      0  3     0x14200  aiodoned      aiodoned
 32008      0      0      0  3     0x14200  syncer        update
 19673      0      0      0  3     0x14200  cleaner       cleaner
  4919      0      0      0  3     0x14200  reaper        reaper
 20066      0      0      0  3     0x14200  pgdaemon      pagedaemon
  9395      0      0      0  3     0x14200  bored         crypto
 25366      0      0      0  3     0x14200  pftm          pfpurge
 19427      0      0      0  3     0x14200  usbtsk        usbtask
 16297      0      0      0  3     0x14200  usbatsk       usbatsk
  8314      0      0      0  3     0x14200  bored         sensors
 12953      0      0      0  3     0x14200  bored         softnet
 18789      0      0      0  3     0x14200  bored         systqmp
 30000      0      0      0  3     0x14200  bored         systq
*28113      0      0      0  7  0x40014200                idle0
  9289      0      0      0  3     0x14200  kmalloc       kmthread
     1      0      1      0  3        0x82  wait          init
     0     -1      0      0  3     0x10200  scheduler     swapper

I'll see if a 5.8 release build with just your diff makes any difference for this machine....

Reply via email to