Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Dan McDonald
On Jul 13, 2015, at 11:29 AM, Dan McDonald dan...@omniti.com wrote: On Jul 13, 2015, at 11:25 AM, Derek Yarnell de...@umiacs.umd.edu wrote: https://obj.umiacs.umd.edu/derek_support/vmdump.0 Yeah, that's what I'm seeking. Downloading it now to an r151014 box (you are running r151014

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Dan McDonald
On Jul 13, 2015, at 11:56 AM, Derek Yarnell de...@umiacs.umd.edu wrote: I don't need to hot patch (cold patch would be fine) so any update that I can apply and reboot would be fine. We have a second OmniOS r14 copy running that we are happy to patch in any way possible to get it mounted

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Derek Yarnell
ff0d4071ca98::print arc_buf_t b_hdr |::print arc_buf_hdr_t b_size b_size = 0 Ouch. There's your zero. I'm going to forward this very note to the illumos ZFS list. I see ONE possible bugfix post-r151014 that might help: commit 31c46cf23cd1cf4d66390a983dc5072d7d299ba2 Author:

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Derek Yarnell
Hi Dan, Sorry I have not dealt with dumpadm/savecore that much but it looks like this is what you want. https://obj.umiacs.umd.edu/derek_support/vmdump.0 Thanks, derek On 7/13/15 12:55 AM, Dan McDonald wrote: On Jul 12, 2015, at 9:18 PM, Richard Elling richard.ell...@richardelling.com

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-13 Thread Derek Yarnell
On 7/13/15 12:02 PM, Dan McDonald wrote: On Jul 13, 2015, at 11:56 AM, Derek Yarnell de...@umiacs.umd.edu wrote: I don't need to hot patch (cold patch would be fine) so any update that I can apply and reboot would be fine. We have a second OmniOS r14 copy running that we are happy to

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-12 Thread Paul B. Henson
On Sun, Jul 12, 2015 at 06:18:17PM -0700, Richard Elling wrote: Some additional block pointer verification code was added in changeset f63ab3d5a84a12b474655fc7e700db3efba6c4c9 and likely is the cause of this assertion. In general, assertion failures are almost always software problems -- the

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-12 Thread Günther Alka
First action: If you can mount the pool read-only, update your backup Then I would expect that a single bad disk is the reason of the problem on a write command. I would first check the system and fault log or smartvalues for hints about a bad disk. If there is a suspicious disk, remove that

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-12 Thread Richard Elling
On Jul 12, 2015, at 5:26 PM, Derek Yarnell de...@umiacs.umd.edu wrote: On 7/12/15 3:21 PM, Günther Alka wrote: First action: If you can mount the pool read-only, update your backup We are securing all the non-scratch data currently before messing with the pool any more. We had backups

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-12 Thread Derek Yarnell
On 7/12/15 3:21 PM, Günther Alka wrote: First action: If you can mount the pool read-only, update your backup We are securing all the non-scratch data currently before messing with the pool any more. We had backups as recent as the night before but it is still going to be faster to pull the

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-12 Thread Dan McDonald
On Jul 12, 2015, at 9:18 PM, Richard Elling richard.ell...@richardelling.com wrote: Dan, if you're listening, Matt would be the best person to weigh-in on this. Yes he would be, Richard.. The panic in the arc_get_data_buf() paths is similar to older problems we'd seen in r151006. Derek,

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-12 Thread Bob Friesenhahn
On Sat, 11 Jul 2015, Derek Yarnell wrote: Hi, We just have had a catastrophic event on one of our OmniOS r14 file servers. In what seems to have been triggered by the weekly scrub of its one large zfs pool (~100T) it panics. This made it basically reboot continually and we have installed a

Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-12 Thread Derek Yarnell
The on-going scrub automatically restarts, apparently even in read-only mode. You should 'zpool scrub -s poolname' ASAP after boot (if you can) to stop the ongoing scrub. We have tried to stop the scrub but it seems you can not cancel a scrub when the pool is mounted readonly. -- Derek T.

[OmniOS-discuss] ZFS crash/reboot loop

2015-07-11 Thread Derek Yarnell
Hi, We just have had a catastrophic event on one of our OmniOS r14 file servers. In what seems to have been triggered by the weekly scrub of its one large zfs pool (~100T) it panics. This made it basically reboot continually and we have installed a second copy of OmniOS r14 in the mean time.