On 02/13, Johan Huldtgren wrote:
> On 2/13/16 19:31, Jonathan Gray wrote:
> >On Sat, Feb 13, 2016 at 11:37:17AM +0100, Stefan Sperling wrote:
> >>On Sat, Feb 13, 2016 at 03:42:16PM +1100, Jonathan Gray wrote:
> >>>On Fri, Feb 12, 2016 at 11:07:35AM -0500, Johan Huldtgren wrote:
> >>>>uvm_fault(0xffffffff8193f240, 0x38, 0, 1) -> e
> >>>>kernel: page fault trap, code=0
> >>>>Stopped at sr_validate_io+0x36: movl 0x38(%r9),%r10d
> >>>>ddb{1}> trace
> >>>>sr_validate_io() at sr_validate_io+0x36
> >>>>sr_raid5_rw() at sr_raid5_rw+0x40
> >>>>sr_raid_recreate_wu() at sr_raid_recreate_wu+0x2c
> >>>>sr_wu_done_callback() at sr_wu_done_callback+0x17a
> >>>>taskq_thread() at taskq_thread+0x6c
> >>>
> >>>Thanks for all the detail you've provided here. The fault appears to be
> >>>caused by a NULL xs. A diff to error in that case is provided below.
> >>>
> >>>Perhaps someone familiar with the softraid/scsi code can comment as to
> >>>why this is occuring.
> >>
> >>The trace of the first crash reported by Johan looks like it could perhaps
> >>be caused by a garbage xs, doesn't it?
> >>
> >>Quoting http://marc.info/?l=openbsd-misc&m=145477770306396&w=2
> >>panic: Non dma-reachable buffer at curaddr 0xffffffff81115888(raw)
> >>Stopped at Debugger+0x9: leave
> >>TID PID UID PRFLAGS PFLAGS CPU COMMAND
> >>*25637 25637 0 0x14000 0x200 1 srdis
> >>Debugger() at Debugger+0x9
> >>panic() at panic+0xfe
> >>_bus_dmamap_load_buffer() at _bus_dmamap_load_buffer+0x1b6
> >>_bus_dmamap_load() at _bus_dmamap_load+0x7f
> >>ahci_load_prdt() at ahci_load_prdt+0x97
> >>ahci_ata_cmd() at ahci_ata_cmd+0x69
> >>atascsi_disk_cmd() at atascsis_disk_cmd+0x1b1
> >>scsi_xs_exec() scsi_xs_exec+0x35
> >>sdstart() at sdstart+0x16f
> >>scsi_iopool_run() at scsi_iopool_run+0x5d
> >>scsi_xsh_runqueue() at scsi_xsh_runqueue+0x13d
> >>scsi_xsh_add() at scsi_xsh_add+0x98
> >>sdstrategy() at sdstrategy+0x10f
> >>spec_strategy() at spec_strategy+0x53
> >>end trace frame: 0xffff800032ca1e40, count: 0
> >>
> >>Maybe there's an uninitialized variable which happened to contain
> >>zeros in this second instance of the crash?
> >
> >The potentially uninitialised variable use turns out to be these:
> >
> >Index: softraid.c
> >===================================================================
> >RCS file: /cvs/src/sys/dev/softraid.c,v
> >retrieving revision 1.365
> >diff -u -p -r1.365 softraid.c
> >--- softraid.c 29 Dec 2015 04:46:28 -0000 1.365
> >+++ softraid.c 14 Feb 2016 00:28:14 -0000
> >@@ -3113,7 +3113,7 @@ sr_rebuild_init(struct sr_discipline *sd
> > struct disklabel label;
> > struct vnode *vn;
> > u_int64_t size;
> >- int64_t csize;
> >+ int64_t csize = 0;
> > char devname[32];
> > int rv = EINVAL, open = 0;
> > int cid, i, part, status;
> >@@ -3206,6 +3206,7 @@ sr_rebuild_init(struct sr_discipline *sd
> > devname);
> > goto done;
> > }
> >+ /* here */
> > if (size < csize) {
> > sr_error(sc, "%s partition too small, at least %lld bytes "
> > "required", devname, (long long)(csize << DEV_BSHIFT));
> >@@ -3657,7 +3658,7 @@ sr_ioctl_installboot(struct sr_softc *sc
> > struct bioc_installboot *bb)
> > {
> > void *bootblk = NULL, *bootldr = NULL;
> >- struct sr_chunk *chunk;
> >+ struct sr_chunk *chunk = NULL;
> > struct sr_meta_opt_item *omi;
> > struct sr_meta_boot *sbm;
> > struct disk *dk;
> >@@ -3786,7 +3787,7 @@ sr_ioctl_installboot(struct sr_softc *sc
> > sd->sd_meta->ssdi.ssd_vol_flags |= BIOC_SCBOOTABLE;
> > if (sr_meta_save(sd, SR_META_DIRTY)) {
> > sr_error(sc, "could not save metadata to %s",
> >- chunk->src_devname);
> >+ chunk ? chunk->src_devname : "disk");
> > goto done;
> > }
> >
> >
>
> Do I put this on top of the previous diff you sent, or do I revert that one
> and just
> use this one?
>
> thanks,
>
> .jh
>
I think this might be better. The chunk->src_devname seems to me to be
the wrong variable to use. The failures to individual chunks (which may
not be the last chunk the boot was written to) are reported in
sr_meta_save() as far as I can see.
And if we can't find a coerced size we should just bail with a more
useful error message.
.... Ken
Index: softraid.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid.c,v
retrieving revision 1.365
diff -u -p -r1.365 softraid.c
--- softraid.c 29 Dec 2015 04:46:28 -0000 1.365
+++ softraid.c 14 Feb 2016 01:19:37 -0000
@@ -3151,14 +3151,18 @@ sr_rebuild_init(struct sr_discipline *sd
}
/* Get coerced size from another online chunk. */
+ csize = 0;
for (i = 0; i < sd->sd_meta->ssdi.ssd_chunk_no; i++) {
if (sd->sd_vol.sv_chunks[i]->src_meta.scm_status ==
BIOC_SDONLINE) {
- meta = &sd->sd_vol.sv_chunks[i]->src_meta;
csize = meta->scmi.scm_coerced_size;
break;
}
}
+ if (csize == 0) {
+ sr_error(sc, "no online chunks available for rebuild");
+ goto done;
+ }
sr_meta_getdevname(sc, dev, devname, sizeof(devname));
if (bdevvp(dev, &vn)) {
@@ -3777,7 +3781,6 @@ sr_ioctl_installboot(struct sr_softc *sc
sr_error(sc, "failed to write boot loader");
goto done;
}
-
}
/* XXX - Install boot block on disk - MD code. */
@@ -3785,8 +3788,7 @@ sr_ioctl_installboot(struct sr_softc *sc
/* Mark volume as bootable and save metadata. */
sd->sd_meta->ssdi.ssd_vol_flags |= BIOC_SCBOOTABLE;
if (sr_meta_save(sd, SR_META_DIRTY)) {
- sr_error(sc, "could not save metadata to %s",
- chunk->src_devname);
+ sr_error(sc, "could not save metadata to %s", DEVNAME(sc));
goto done;
}