>Number:         144824
>Category:       kern
>Synopsis:       boot problem on USB (root partition mounting)
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 17 16:50:01 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Gilles Blanc
>Release:        8.0-RELEASE (current)
>Organization:
Linagora
>Environment:
FreeBSD freedaemon.par.lng 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 
15:02:08 UTC 2009     r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  
amd64
>Description:
The current system on boot (file /sys/kern/vfs_mount.c) uses a queue to wait 
for devices to be initialized before mounting root (or try to do so). This 
queue is filled for instance by usb driver (using "root_mount_hold" function), 
so if we boot on a USB key, the function "root_mount_prepare" holds the root 
mount time until USB is available (that is to say the queue has be emptied by 
using "root_mount_rel" on all the identifiers filled by the usb driver).

Actually, it only waits for USB to be "physically" available, but not 
necessarily umass or scsi (scsi-da). To be more precise, the system is not 
deterministic, because to be mounted, a root partition on a USB key needs both 
umass then scsi to be initialized, and if most of the time the mount process 
works, it is because the 'root_holds' list is not empty, and threads are 
running concurrently (for example one have wired a usb key on usb0, the system 
sequentially initializes usb0 to usb7, and during that time, umass0 and da0 are 
initialized too).

Unfortunately, some servers are not that kind, and root mounting just fails 
('vfs_mountroot' function asks to 'vfs_mountroot_try' to mount USB root 
partition, which is not yet available), so we are in a situation where the 
"ROOT MOUNT ERROR" prompt appears, to mount our partition by hand, which is not 
very acceptable on production servers (we would have to go some kilometers just 
to type "ufs:/dev/da0s1a" each time we reboot...).

The problem is not blocking for most of FreeBSD users, but it prevents us to 
migrate our systems (which is quite a big problem).
>How-To-Repeat:
If you have a machine presenting this problem, you can repeat it easily (it 
fails 95% of the time) ; if not (like in my development laptop), you will never 
succeed to fail.
>Fix:
I have tried to add locks in umass and scsi drivers. In umass driver, it is in 
the /sys/dev/usb/storage/umass.c file, in function 'umass_attach' (in our 
supermicro server, umass has enough time to initialize, but I have been 
rigorous). In scsi driver, it is in the /sys/cam/scsi/scsi_da.c file, in 
function 'dastart', part "DA_STATE_PROBE2" of the switch/case. Unfortunately, 
between this two pairs of locking/unlocking, the root mounting thread preempts 
and as the list is empty during this very short time, it tries to mount root 
partition and fails as usual. It is not possible to add a lock in umass and 
remove it in scsi, because of the API which works with pointers on the lock 
list at the removal.

So another solution has to be considered, that is what I propose with this 
patch. Simply, in the vfs_mountroot_try, I try several times, with a little 
pause between, to call the 'kernel_mount' function. The number of trials is 3 
by default, but can be customized through the new "vfs.root.mounttrymax" option 
in /boot/loader.conf (even set to 0, if we want to go back to the initial 
behavior). Each time the mount process fails and we can retry, a message 
appears, the thread sleeps for one second, and then try again. If it is really 
impossible to mount root, then we continue in the normal process of prompt.

Actually, there is still some problems on some USB ports (the other ones on the 
same machine work great at the first or second mounting retrial). I suspect a 
deeper problem in 'kernel_mount', because using the prompt doesn't mount the 
device, or worse can lead to page fault or locking. But my patch is enough to 
resolve the original problem as far as it is possible in the state of things.

I hope it will be reviewed and accepted as soon as possible.

Patch attached with submission follows:

--- vfs_mount.c 2010-03-17 15:30:45.000000000 +0100
+++ vfs_mount.c 2010-03-17 14:49:52.000000000 +0100
@@ -1798,6 +1806,8 @@
        int             error;
        char            patt[32];
        char            errmsg[255];
+       char            nbtry;
+       int             rootmounttrymax;
 
        vfsname = NULL;
        path    = NULL;
@@ -1805,6 +1815,8 @@
        ma      = NULL;
        error   = EINVAL;
        bzero(errmsg, sizeof(errmsg));
+       nbtry   = 0;
+       rootmounttrymax = 3;
 
        if (mountfrom == NULL)
                return (error);         /* don't complain */
@@ -1827,7 +1839,18 @@
        ma = mount_arg(ma, "errmsg", errmsg, sizeof(errmsg));
        ma = mount_arg(ma, "ro", NULL, 0);
        ma = parse_mountroot_options(ma, options);
-       error = kernel_mount(ma, MNT_ROOTFS);
+
+       TUNABLE_INT_FETCH("vfs.root.mounttrymax", &rootmounttrymax);
+       while (1) {
+               error = kernel_mount(ma, MNT_ROOTFS);
+               if (nbtry < rootmounttrymax && error != 0) {
+                       printf("Mount failed, retrying mount root from %s\n", 
mountfrom);
+                       tsleep(&rootmounttrymax, PZERO | PDROP, "mount", hz);
+                       nbtry++;
+               }
+               else
+                       break;
+       }
 
        if (error == 0) {
                /*


>Release-Note:
>Audit-Trail:
>Unformatted:
_______________________________________________
freebsd-bugs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to