On 02/13, Johan Huldtgren wrote:
> On 2/13/16 19:31, Jonathan Gray wrote:
> >On Sat, Feb 13, 2016 at 11:37:17AM +0100, Stefan Sperling wrote:
> >>On Sat, Feb 13, 2016 at 03:42:16PM +1100, Jonathan Gray wrote:
> >>>On Fri, Feb 12, 2016 at 11:07:35AM -0500, Johan Huldtgren wrote:
> >>>>uvm_fault(0xffffffff8193f240, 0x38, 0, 1) -> e
> >>>>kernel: page fault trap, code=0
> >>>>Stopped at sr_validate_io+0x36: movl 0x38(%r9),%r10d
> >>>>ddb{1}> trace
> >>>>sr_validate_io() at sr_validate_io+0x36
> >>>>sr_raid5_rw() at sr_raid5_rw+0x40
> >>>>sr_raid_recreate_wu() at sr_raid_recreate_wu+0x2c
> >>>>sr_wu_done_callback() at sr_wu_done_callback+0x17a
> >>>>taskq_thread() at taskq_thread+0x6c
> >>>
> >>>Thanks for all the detail you've provided here. The fault appears to be
> >>>caused by a NULL xs. A diff to error in that case is provided below.
> >>>
> >>>Perhaps someone familiar with the softraid/scsi code can comment as to
> >>>why this is occuring.
> >>
> >>The trace of the first crash reported by Johan looks like it could perhaps
> >>be caused by a garbage xs, doesn't it?
> >>
> >>Quoting http://marc.info/?l=openbsd-misc&m=145477770306396&w=2
> >>panic: Non dma-reachable buffer at curaddr 0xffffffff81115888(raw)
> >>Stopped at Debugger+0x9: leave
> >>TID PID UID PRFLAGS PFLAGS CPU COMMAND
> >>*25637 25637 0 0x14000 0x200 1 srdis
> >>Debugger() at Debugger+0x9
> >>panic() at panic+0xfe
> >>_bus_dmamap_load_buffer() at _bus_dmamap_load_buffer+0x1b6
> >>_bus_dmamap_load() at _bus_dmamap_load+0x7f
> >>ahci_load_prdt() at ahci_load_prdt+0x97
> >>ahci_ata_cmd() at ahci_ata_cmd+0x69
> >>atascsi_disk_cmd() at atascsis_disk_cmd+0x1b1
> >>scsi_xs_exec() scsi_xs_exec+0x35
> >>sdstart() at sdstart+0x16f
> >>scsi_iopool_run() at scsi_iopool_run+0x5d
> >>scsi_xsh_runqueue() at scsi_xsh_runqueue+0x13d
> >>scsi_xsh_add() at scsi_xsh_add+0x98
> >>sdstrategy() at sdstrategy+0x10f
> >>spec_strategy() at spec_strategy+0x53
> >>end trace frame: 0xffff800032ca1e40, count: 0
> >>
> >>Maybe there's an uninitialized variable which happened to contain
> >>zeros in this second instance of the crash?
> >
> >The potentially uninitialised variable use turns out to be these:
> >
> >Index: softraid.c
> >===================================================================
> >RCS file: /cvs/src/sys/dev/softraid.c,v
> >retrieving revision 1.365
> >diff -u -p -r1.365 softraid.c
> >--- softraid.c 29 Dec 2015 04:46:28 -0000 1.365
> >+++ softraid.c 14 Feb 2016 00:28:14 -0000
> >@@ -3113,7 +3113,7 @@ sr_rebuild_init(struct sr_discipline *sd
> > struct disklabel label;
> > struct vnode *vn;
> > u_int64_t size;
> >- int64_t csize;
> >+ int64_t csize = 0;
> > char devname[32];
> > int rv = EINVAL, open = 0;
> > int cid, i, part, status;
> >@@ -3206,6 +3206,7 @@ sr_rebuild_init(struct sr_discipline *sd
> > devname);
> > goto done;
> > }
> >+ /* here */
> > if (size < csize) {
> > sr_error(sc, "%s partition too small, at least %lld bytes "
> > "required", devname, (long long)(csize << DEV_BSHIFT));
> >@@ -3657,7 +3658,7 @@ sr_ioctl_installboot(struct sr_softc *sc
> > struct bioc_installboot *bb)
> > {
> > void *bootblk = NULL, *bootldr = NULL;
> >- struct sr_chunk *chunk;
> >+ struct sr_chunk *chunk = NULL;
> > struct sr_meta_opt_item *omi;
> > struct sr_meta_boot *sbm;
> > struct disk *dk;
> >@@ -3786,7 +3787,7 @@ sr_ioctl_installboot(struct sr_softc *sc
> > sd->sd_meta->ssdi.ssd_vol_flags |= BIOC_SCBOOTABLE;
> > if (sr_meta_save(sd, SR_META_DIRTY)) {
> > sr_error(sc, "could not save metadata to %s",
> >- chunk->src_devname);
> >+ chunk ? chunk->src_devname : "disk");
> > goto done;
> > }
> >
> >
>
> Do I put this on top of the previous diff you sent, or do I revert that one
> and just
> use this one?
>
> thanks,
>
> .jh
>
Oops. And here is a version w/o an errant line deletion.
.... Ken
Index: softraid.c
===================================================================
RCS file: /cvs/src/sys/dev/softraid.c,v
retrieving revision 1.365
diff -u -p -r1.365 softraid.c
--- softraid.c 29 Dec 2015 04:46:28 -0000 1.365
+++ softraid.c 14 Feb 2016 03:06:40 -0000
@@ -3151,6 +3151,7 @@ sr_rebuild_init(struct sr_discipline *sd
}
/* Get coerced size from another online chunk. */
+ csize = 0;
for (i = 0; i < sd->sd_meta->ssdi.ssd_chunk_no; i++) {
if (sd->sd_vol.sv_chunks[i]->src_meta.scm_status ==
BIOC_SDONLINE) {
@@ -3159,6 +3160,10 @@ sr_rebuild_init(struct sr_discipline *sd
break;
}
}
+ if (csize == 0) {
+ sr_error(sc, "no online chunks available for rebuild");
+ goto done;
+ }
sr_meta_getdevname(sc, dev, devname, sizeof(devname));
if (bdevvp(dev, &vn)) {
@@ -3777,7 +3782,6 @@ sr_ioctl_installboot(struct sr_softc *sc
sr_error(sc, "failed to write boot loader");
goto done;
}
-
}
/* XXX - Install boot block on disk - MD code. */
@@ -3785,8 +3789,7 @@ sr_ioctl_installboot(struct sr_softc *sc
/* Mark volume as bootable and save metadata. */
sd->sd_meta->ssdi.ssd_vol_flags |= BIOC_SCBOOTABLE;
if (sr_meta_save(sd, SR_META_DIRTY)) {
- sr_error(sc, "could not save metadata to %s",
- chunk->src_devname);
+ sr_error(sc, "could not save metadata to %s", DEVNAME(sc));
goto done;
}