Hello.

Last week I upgraded a 9.3/amd64 box to 10.3: since then, it crashed and rebooted at least once every night.


The only exception was on Friday, when it locked without rebooting: it still answered ping request and logins through HTTP would half work; I'm under the impression that the disk subsystem was hung, so ICMP would work since it does no I/O and HTTP too worked as far as no disk access was required.

Today I was able to get a couple of (almost identical) dumps:

cpuid = 1
KDB: stack backtrace:
#0 0xffffffff804ee170 at kdb_backtrace+0x60
#1 0xffffffff804b4576 at vpanic+0x126
#2 0xffffffff804b4443 at panic+0x43
#3 0xffffffff8068fd2a at softdep_deallocate_dependencies+0x6a
#4 0xffffffff805394b5 at brelse+0x145
#5 0xffffffff8053793c at bufwrite+0x3c
#6 0xffffffff806ae20f at ffs_write+0x3df
#7 0xffffffff8076d519 at VOP_WRITE_APV+0x149
#8 0xffffffff806ec7c9 at vnode_pager_generic_putpages+0x2a9
#9 0xffffffff8076f3b7 at VOP_PUTPAGES_APV+0xa7
#10 0xffffffff806ea6f5 at vnode_pager_putpages+0xc5
#11 0xffffffff806e17f8 at vm_pageout_flush+0xc8
#12 0xffffffff806db432 at vm_object_page_collect_flush+0x182
#13 0xffffffff806db1cd at vm_object_page_clean+0x13d
#14 0xffffffff806dadbe at vm_object_terminate+0x8e
#15 0xffffffff806eac60 at vnode_destroy_vobject+0x90
#16 0xffffffff806b4232 at ufs_reclaim+0x22
#17 0xffffffff8076e5c7 at VOP_RECLAIM_APV+0xa7



Has anyone any better insight on what might be going on?
The disks are all connected to a SAS RAID adapter running on mfi; I don't think it might be an hardware issue, since it has worked perfectly for years until I did the upgrade; also mfiutil says everything is ok and nothing mfi-related is in the logs.



Some ideas come to mind about which I might use a second opinion:

_ soft-update is broken: that would really surprise me, since I've been using that for years on this and several other boxes (10.3 too);

_ snapshot creation/deletion is causing this: again I'm using that almost anywhere, so I don't think this might be the cause alone; besides, I've been able to do some dumps without trouble and I don't think anything was messing with snapshots at the time of the last two panics;

_ mfi driver is broken on 10.3: this is more reasonable to me, since this is the only machine I have it on and it's the only case where I get this panics. I found https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183618, but I get no "g_vfs_done()..." messages.

Any other hint?



I'd really like to find out what's going on, I'll appreciate any help and I'm willing to provide any useful info.

On the other hand, this is a production server, so I have to solve this really soon. Some idea comes to mind, like disabling softupdate (knowing which file system was having trouble would help here; is there any way to know?), trying to enable journaling, upgrading to 10-STABLE, build a kernel with INVARIANTS/WITNESS/etc..., but I'd appreciate a second opinion before I start shooting in the dark.



 bye & Thanks
        av.
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to