On 11/24/15 23:46, Fred wrote:
On 11/24/15 03:31, Steven Chamberlain wrote:
Hi!
Would anyone like to try this change? It's early to say if this
definitely fixed the issue for me, but it looks promising:
--- sys/kern/subr_pool.c
+++ sys/kern/subr_pool.c
@@ -259,5 +259,5 @@ pool_init(struct pool *pp, size_t size,
if (pgsize - (size * items) > sizeof(struct pool_item_header)) {
off = pgsize - sizeof(struct pool_item_header);
- } else if (sizeof(struct pool_item_header) * 2 >= size) {
+ } else if (sizeof(struct pool_item_header) * 8 >= size) {
off = pgsize - sizeof(struct pool_item_header);
items = off / size;
Prior to v1.149, there was a threshold of I think PAGE_SIZE/16=512
on sparc64; pools for an item size greater than that would use an in-
page header:
* Decide whether to put the page header off page to avoid
* wasting too large a part of the page. Off-page page headers
* go into an RB tree, so we can match a returned item with
* its header based on the page address.
* We use 1/16 of the page size as the threshold (XXX: tune)
*/
if (pp->pr_size < palloc->pa_pagesz/16 && pp->pr_size < PAGE_SIZE) {
/* Use the end of the page for the page header */
In v1.149 the threshold became sizeof(struct pool_item_header)*2=224 on
sparc64, so dma256 and dma512 pools would no longer use an in-page
header, but be able to accommodate more items per page as a result.
The adjustment above simply reverts that behavioural change. It
probably never should have broken anything, other than slight
performance change, but it seems like it triggered some maybe pre-
existing bug elsewhere.
I've already ruled out the unsigned int arithmetic I've mentioned thus
far, with KASSERT()s that didn't trigger even when the crash happens.
And I've already tried to rule out cache colouring by forcing
pp->pr_maxcolors=0 to no avail. (Since it was only used in pools
with an in-page header, it could have been related).
p.s. I would maybe even test if this helps with tmpfs issues seen on
armv7 and such, as I think that was first mentioned around the time of
this change, and since it uses pool(9) for its file metadata.
Regards,
Well with that diff I hit another panic - which seems to be triggered by
the nic:
ddb> trace
data_access_error(e00173f8, 0, 1fe02010048, 84000000, 90d5fe0000, 0) at
data_ac
cess_error+0x19c
trapbase_sun4v(40000556000, 0, 8, 1fe0000f068, 1fe0000f078,
8000003e00000000) a
t trapbase_sun4v+0x87a8
dc_mii_send(40000556000, 8000, 1, 188aec8, 1, 180e6a0) at dc_mii_send+0x20
dc_mii_writereg(40000556000, e0017848, 0, 4000801bc30, 0, 3b9ac800) at
dc_mii_w
ritereg+0x94
dc_miibus_writereg(40000556000, 1, 1833670, 8000, 0, 8) at
dc_miibus_writereg+0
x94
mii_phy_reset(40000436500, 1, 1, 64, 0, 3b9ac800) at mii_phy_reset+0x34
mii_phy_tick(40000436500, 10012a00000000, 305e7b209fe, faaf56200000000,
bb6a, f
b7423403e94f590) at mii_phy_tick+0xb0
amphy_service(40000436500, 400005563c0, 1, 1800, 1893cf0, 188b000) at
amphy_ser
vice+0x114
mii_tick(400005563c0, 40007fadde0, 11a4060, 188aec8, 40004928000,
1833670) at m
ii_tick+0x20
dc_tick(40000556000, 8000003e0fabce6a, 8, 1fe0000f068, 1fe0000f078,
8000003e000
00000) at dc_tick+0x1c8
softclock(1800, e, 18cecb8, 188aec8, 1, 180e6a0) at softclock+0x2d4
intr_handler(e0017ec8, 40000e36000, 94997c, 4000801bc30, 0, 3b9ac800) at
intr_h
andler+0xc
sparc_interrupt(188f0c0, e0018000, 168f5c8, 188aec8, 0, 3b9ac800) at
sparc_inte
rrupt+0x298
sched_idle(e0018000, 4000491c480, 1690170, 64, 0, 3b9ac800) at
sched_idle+0x114
proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x4
ddb>
ddb> ps
TID PPID PGRP UID S FLAGS WAIT COMMAND
21469 17760 17760 1000 3 0x83 select ssh
17760 27053 17760 1000 3 0x100003 biowait cvs
27053 22678 27053 1000 3 0x10008b pause ksh
22678 32723 32723 1000 3 0x90 select sshd
32723 19761 32723 0 3 0x92 poll sshd
23880 1 23880 0 3 0x100083 ttyin getty
30610 1 30610 0 3 0x100098 poll cron
16745 1 16745 99 3 0x90 poll sndiod
5824 30082 30082 95 3 0x100090 kqread smtpd
8157 30082 30082 95 3 0x100090 kqread smtpd
13105 30082 30082 95 3 0x100090 kqread smtpd
9772 30082 30082 95 3 0x100090 kqread smtpd
9286 30082 30082 95 3 0x100090 kqread smtpd
28022 30082 30082 103 3 0x100090 kqread smtpd
30082 1 30082 0 3 0x100080 kqread smtpd
19761 1 19761 0 3 0x80 select sshd
32029 16805 16805 74 3 0x100090 bpf pflogd
16805 1 16805 0 3 0x80 netio pflogd
20237 3513 3513 73 2 0x100090 syslogd
3513 1 3513 0 3 0x100080 netio syslogd
14265 1 14265 77 3 0x90 poll dhclient
--db_more--
6624 1 6624 0 3 0x80 poll dhclient
12749 0 0 0 3 0x14200 pgzero zerothread
7247 0 0 0 3 0x14200 aiodoned aiodoned
32008 0 0 0 3 0x14200 syncer update
19673 0 0 0 3 0x14200 cleaner cleaner
4919 0 0 0 3 0x14200 reaper reaper
20066 0 0 0 3 0x14200 pgdaemon pagedaemon
9395 0 0 0 3 0x14200 bored crypto
25366 0 0 0 3 0x14200 pftm pfpurge
19427 0 0 0 3 0x14200 usbtsk usbtask
16297 0 0 0 3 0x14200 usbatsk usbatsk
8314 0 0 0 3 0x14200 bored sensors
12953 0 0 0 3 0x14200 bored softnet
18789 0 0 0 3 0x14200 bored systqmp
30000 0 0 0 3 0x14200 bored systq
*28113 0 0 0 7 0x40014200 idle0
9289 0 0 0 3 0x14200 kmalloc kmthread
1 0 1 0 3 0x82 wait init
0 -1 0 0 3 0x10200 scheduler swapper
I'll see if a 5.8 release build with just your diff makes any difference
for this machine....