Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0

2017-03-16 Thread Warner Losh
[[ stupid mouse ]]

On Thu, Mar 16, 2017 at 10:01 AM, Warner Losh  wrote:
> On Thu, Mar 16, 2017 at 6:06 AM, Pete French  
> wrote:
>>> I don't like the delay and retry approach at all.
>>
>> Its not ideal, but it is what we do for UFS after all...
>>
>>> Imagine that you told the kernel that you want to mount your root from a ZFS
>>> pool which is on a USB driver which you have already thrown out.  Should the
>>> kernel just keep waiting for that pool to appear?
>>
>> I'm not talking about an infinite loop here, just making it honour
>> the 'vfs.mountroot.timeout' setting like it does ofr UFS. So it
>> should wait for the timeout I have set and then proceed as it would if
>> there had been no timeout. Default behaviout is for it to behave as it
>> does now, its onyl when you need the retry that you enable it.
>
> Put another way: With UFS is keeps retrying until the timeout expires.
> If the first try succeeds, the boot is immediate.
>
>> Right now this works for UFS, but not for ZFS, which is an inconsistency
>> that I dont like, and also means I am being forced down a UFS root
>> path if I require this.
>
> Yes. ZFS is special, but I don't think the assumptions behind its
> specialness are quite right:
>
> /*
>  * In case of ZFS and NFS we don't have a way to wait for
>  * specific device.  Also do the wait if the user forced that
>  * behaviour by setting vfs.root_mount_always_wait=1.
>  */
> if (strcmp(fs, "zfs") == 0 || strstr(fs, "nfs") != NULL ||
> dev[0] == '\0' || root_mount_always_wait != 0) {
> vfs_mountroot_wait();
> return (0);
> }
>
> So you can make it always succeed by forcing the wait, but that's lame...

Later we check to see if a device by a given name is present. Since
ZFS doesn't present its pool names as devices to the rest of the
system, that's not going to work quite right. That's the real reason
that ZFS is special. It isn't that we can't wait for individual
devices, it's that we can't wait for the 'mount token' that we use for
what to mount to be 'ready'. NFS suffers from the same problem, but
since its device is always ready since it's stateless, it isn't as
noticeable.

Warner
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0

2017-03-16 Thread Warner Losh
On Thu, Mar 16, 2017 at 6:06 AM, Pete French  wrote:
>> I don't like the delay and retry approach at all.
>
> Its not ideal, but it is what we do for UFS after all...
>
>> Imagine that you told the kernel that you want to mount your root from a ZFS
>> pool which is on a USB driver which you have already thrown out.  Should the
>> kernel just keep waiting for that pool to appear?
>
> I'm not talking about an infinite loop here, just making it honour
> the 'vfs.mountroot.timeout' setting like it does ofr UFS. So it
> should wait for the timeout I have set and then proceed as it would if
> there had been no timeout. Default behaviout is for it to behave as it
> does now, its onyl when you need the retry that you enable it.

Put another way: With UFS is keeps retrying until the timeout expires.
If the first try succeeds, the boot is immediate.

> Right now this works for UFS, but not for ZFS, which is an inconsistency
> that I dont like, and also means I am being forced down a UFS root
> path if I require this.

Yes. ZFS is special, but I don't think the assumptions behind its
specialness are quite right:

/*
 * In case of ZFS and NFS we don't have a way to wait for
 * specific device.  Also do the wait if the user forced that
 * behaviour by setting vfs.root_mount_always_wait=1.
 */
if (strcmp(fs, "zfs") == 0 || strstr(fs, "nfs") != NULL ||
dev[0] == '\0' || root_mount_always_wait != 0) {
vfs_mountroot_wait();
return (0);
}

So you can make it always succeed by forcing the wait, but that's lame...
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0

2017-03-16 Thread Pete French
> I don't like the delay and retry approach at all.

Its not ideal, but it is what we do for UFS after all...

> Imagine that you told the kernel that you want to mount your root from a ZFS
> pool which is on a USB driver which you have already thrown out.  Should the
> kernel just keep waiting for that pool to appear?

I'm not talking about an infinite loop here, just making it honour
the 'vfs.mountroot.timeout' setting like it does ofr UFS. So it
should wait for the timeout I have set and then proceed as it would if
there had been no timeout. Default behaviout is for it to behave as it
does now, its onyl when you need the retry that you enable it.

Right now this works for UFS, but not for ZFS, which is an inconsistency
that I dont like, and also means I am being forced down a UFS root
path if I require this.

> Microsoft provides support for FreeBSD Hyper-V drivers.
> Please try to discuss this problem on virtualization@ or with sephe@ directly.

OK, will do, thanks...

-pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0

2017-03-16 Thread Andriy Gapon
On 16/03/2017 13:18, Pete French wrote:
> 
>> So, the kernel attempted to mount the root even before vmbus was attached 
>> and,
>> thus, before storvsc appeared and informed the kernel that it might be 
>> holding
>> the root.
>> How ZFS was supposed to know that vmbus is ever going to appear?
>> To me this sounds more like a problem with the Hyper-V drivers.
> 
> I am currently running with the patch which waits for a number fo seconds and
> retries the mount, and that appears t fix it. However I dont really like 
> rnning
> a patched OS. How would I set about reporting this to Microsoft and getting it
> fixed, or getting the timeoutpatch commited ? Preferably both, as the timeout
> patch is generally a useful thing to have working for ZFS I think.

I don't like the delay and retry approach at all.
Imagine that you told the kernel that you want to mount your root from a ZFS
pool which is on a USB driver which you have already thrown out.  Should the
kernel just keep waiting for that pool to appear?

Microsoft provides support for FreeBSD Hyper-V drivers.
Please try to discuss this problem on virtualization@ or with sephe@ directly.


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0

2017-03-16 Thread Pete French



So, the kernel attempted to mount the root even before vmbus was attached and,
thus, before storvsc appeared and informed the kernel that it might be holding
the root.
How ZFS was supposed to know that vmbus is ever going to appear?
To me this sounds more like a problem with the Hyper-V drivers.


I am currently running with the patch which waits for a number fo 
seconds and retries the mount, and that appears t fix it. However I dont 
really like rnning a patched OS. How would I set about reporting this to 
Microsoft and getting it fixed, or getting the timeoutpatch commited ? 
Preferably both, as the timeout patch is generally a useful thing to 
have working for ZFS I think.


-pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11-STABLE fails to build with MK_OFED enabled

2017-03-16 Thread Pete French

Thanks - that is a better fix than my hack ;-)

On 03/15/17 20:12, Dimitry Andric wrote:

On 15 Mar 2017, at 13:42, Pete French  wrote:




/usr/src/sys/modules/mlx4ib/../../ofed/drivers/infiniband/hw/mlx4/sysfs.c:90:22:
 error:
 format specifies type 'unsigned long long *' but the argument has type
 'u64 *' (aka 'unsigned long *') [-Werror,-Wformat]
   sscanf(buf, "%llx", _ag_val);
   ^~~~
%lx

Fairly trivial to fix obviously - I chnaged it to %lx - but not sure that would
work on non 64 bit platforms.


Hi Pete,

I have merged the fix (r310232) in r315328.

-Dimitry


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-16 Thread Mark Millard
On 2017-Mar-15, at 11:07 PM, Scott Bennett  wrote:

> Mark Millard  wrote:
> 
>> [Something strange happened to the automatic CC: fill-in for my original
>> reply. Also I should have mentioned that for my test program if a
>> variant is made that does not fork the swapping works fine.]
>> 
>> On 2017-Mar-15, at 9:37 AM, Mark Millard  wrote:
>> 
>>> On 2017-Mar-15, at 6:15 AM, Scott Bennett  wrote:
>>> 
   On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
  wrote:
> On 2017-Mar-14, at 4:44 PM, Bernd Walter  wrote:
> 
>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>>> [test_check() between the fork and the wait/sleep prevents the
>>> failure from occurring. Even a small access to the memory at
>>> that stage prevents the failure. Details follow.]
>> 
>> Maybe a stupid question, since you might have written it somewhere.
>> What medium do you swap to?
>> I've seen broken firmware on microSD cards doing silent data
>> corruption for some access patterns.
> 
> The root filesystem is on a USB SSD on a powered hub.
> 
> Only the kernel is from the microSD card.
> 
> I have several examples of the USB SSD model and have
> never observed such problems in any other context.
> 
> [remainder of irrelevant material deleted  --SB]
 
   You gave a very long-winded non-answer to Bernd's question, so I'll
 repeat it here.  What medium do you swap to?
>>> 
>>> My wording of:
>>> 
>>> The root filesystem is on a USB SSD on a powered hub.
>>> 
>>> was definitely poor. It should have explicitly mentioned the
>>> swap partition too:
>>> 
>>> The root filesystem and swap partition are both on the same
>>> USB SSD on a powered hub.
>>> 
>>> More detail from dmesg -a for usb:
>>> 
>>> usbus0: 12Mbps Full Speed USB v1.0
>>> usbus1: 480Mbps High Speed USB v2.0
>>> usbus2: 12Mbps Full Speed USB v1.0
>>> usbus3: 480Mbps High Speed USB v2.0
>>> ugen0.1:  at usbus0
>>> uhub0:  on usbus0
>>> ugen1.1:  at usbus1
>>> uhub1:  on usbus1
>>> ugen2.1:  at usbus2
>>> uhub2:  on usbus2
>>> ugen3.1:  at usbus3
>>> uhub3:  on usbus3
>>> . . .
>>> uhub0: 1 port with 1 removable, self powered
>>> uhub2: 1 port with 1 removable, self powered
>>> uhub1: 1 port with 1 removable, self powered
>>> uhub3: 1 port with 1 removable, self powered
>>> ugen3.2:  at usbus3
>>> uhub4 on uhub3
>>> uhub4:  on 
>>> usbus3
>>> uhub4: MTT enabled
>>> uhub4: 4 ports with 4 removable, self powered
>>> ugen3.3:  at usbus3
>>> umass0 on uhub4
>>> umass0:  on usbus3
>>> umass0:  SCSI over Bulk-Only; quirks = 0x0100
>>> umass0:0:0: Attached to scbus0
>>> . . .
>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>>> da0:  Fixed Direct Access SPC-4 SCSI device
>>> da0: Serial Number 
>>> da0: 40.000MB/s transfers
>>> 
>>> (Edited a bit because there is other material interlaced, even
>>> internal to some lines. Also: I removed the serial number of the
>>> specific example device.)
> 
> Thank you.  That presents a much clearer picture.
>>> 
   I will further note that any kind of USB device cannot automatically
 be trusted to behave properly.  USB devices are notorious, for example,
 
  [reasons why deleted  --SB]
 
   You should identify where you page/swap to and then try substituting
 a different device for that function as a test to eliminate the possibility
 of a bad storage device/controller.  If the problem still occurs, that
 means there still remains the possibility that another controller or its
 firmware is defective instead.  It could be a kernel bug, it is true, but
 making sure there is no hardware or firmware error occurring is important,
 and as I say, USB devices should always be considered suspect unless and
 until proven innocent.
>>> 
>>> [FYI: This is a ufs context, not a zfs one.]
> 
> Right.  It's only a Pi, after all. :-)

It is a Pine64+ 2GB, not an rpi3.

>>> 
>>> I'm aware of such  things. There is no evidence that has resulted in
>>> suggesting the USB devices that I can replace are a problem. Otherwise
>>> I'd not be going down this path. I only have access to the one arm64
>>> device (a Pine64+ 2GB) so I've no ability to substitution-test what
>>> is on that board.
> 
> There isn't even one open port on that hub that you could plug a
> flash drive into temporarily to be the paging device?

Why do you think that I've never tried alternative devices? It
is just that the result was no evidence that my usually-in-use
SSD is having a special/local problem: the behavior continues
across all such contexts when the Pine64+ 2GB is involved. (Again
I have not had access to an alternate to the one arm64 board.
That limits my substitution testing possibilities.)

Why would you expect a Flash drive to be better than another SSD
for such testing? (The SSD that I usually use even happens to be
a USB 3.0 SSD, capable of USB 3.0 speeds in USB 3.0 contexts. So
is 

Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0

2017-03-16 Thread Andriy Gapon
On 13/03/2017 21:07, Edward Tomasz NapieraƂa wrote:
> Are you sure the above transcript is right?  There are three reasons
> I'm asking.  First, you'll see the "Root mount waiting" message,
> which means the root mount code is, well, waiting for storvsc, exactly
> as expected.  Second - there is no "Trying to mount root".  But most
> of all - for some reason the "Mounting failed" is shown _before_ the
> "Root mount waiting", and I have no idea how this could ever happen.

Edward, your observation is not completely correct.
https://www.twisted.org.uk/~pete/914893a3-249e-4a91-851c-f467fc185eec.txt

We have:

Trying to mount root from zfs:rpool/ROOT/default []... <===
vmbus0: version 3.0
...
storvsc0:  on vmbus0
Solaris: NOTICE: Cannot find the pool label for 'rpool'
Mounting from zfs:rpool/ROOT/default failed with error 5. <===
Root mount waiting for: storvsc <===
...

So, the kernel attempted to mount the root even before vmbus was attached and,
thus, before storvsc appeared and informed the kernel that it might be holding
the root.
How ZFS was supposed to know that vmbus is ever going to appear?
To me this sounds more like a problem with the Hyper-V drivers.


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-16 Thread Scott Bennett
Mark Millard  wrote:

> [Something strange happened to the automatic CC: fill-in for my original
> reply. Also I should have mentioned that for my test program if a
> variant is made that does not fork the swapping works fine.]
>
> On 2017-Mar-15, at 9:37 AM, Mark Millard  wrote:
>
> > On 2017-Mar-15, at 6:15 AM, Scott Bennett  wrote:
> > 
> >>On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
> >>  wrote:
> >>> On 2017-Mar-14, at 4:44 PM, Bernd Walter  wrote:
> >>> 
>  On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
> > [test_check() between the fork and the wait/sleep prevents the
> > failure from occurring. Even a small access to the memory at
> > that stage prevents the failure. Details follow.]
>  
>  Maybe a stupid question, since you might have written it somewhere.
>  What medium do you swap to?
>  I've seen broken firmware on microSD cards doing silent data
>  corruption for some access patterns.
> >>> 
> >>> The root filesystem is on a USB SSD on a powered hub.
> >>> 
> >>> Only the kernel is from the microSD card.
> >>> 
> >>> I have several examples of the USB SSD model and have
> >>> never observed such problems in any other context.
> >>> 
> >>> [remainder of irrelevant material deleted  --SB]
> >> 
> >>You gave a very long-winded non-answer to Bernd's question, so I'll
> >> repeat it here.  What medium do you swap to?
> > 
> > My wording of:
> > 
> > The root filesystem is on a USB SSD on a powered hub.
> > 
> > was definitely poor. It should have explicitly mentioned the
> > swap partition too:
> > 
> > The root filesystem and swap partition are both on the same
> > USB SSD on a powered hub.
> > 
> > More detail from dmesg -a for usb:
> > 
> > usbus0: 12Mbps Full Speed USB v1.0
> > usbus1: 480Mbps High Speed USB v2.0
> > usbus2: 12Mbps Full Speed USB v1.0
> > usbus3: 480Mbps High Speed USB v2.0
> > ugen0.1:  at usbus0
> > uhub0:  on usbus0
> > ugen1.1:  at usbus1
> > uhub1:  on usbus1
> > ugen2.1:  at usbus2
> > uhub2:  on usbus2
> > ugen3.1:  at usbus3
> > uhub3:  on usbus3
> > . . .
> > uhub0: 1 port with 1 removable, self powered
> > uhub2: 1 port with 1 removable, self powered
> > uhub1: 1 port with 1 removable, self powered
> > uhub3: 1 port with 1 removable, self powered
> > ugen3.2:  at usbus3
> > uhub4 on uhub3
> > uhub4:  on 
> > usbus3
> > uhub4: MTT enabled
> > uhub4: 4 ports with 4 removable, self powered
> > ugen3.3:  at usbus3
> > umass0 on uhub4
> > umass0:  on usbus3
> > umass0:  SCSI over Bulk-Only; quirks = 0x0100
> > umass0:0:0: Attached to scbus0
> > . . .
> > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
> > da0:  Fixed Direct Access SPC-4 SCSI device
> > da0: Serial Number 
> > da0: 40.000MB/s transfers
> > 
> > (Edited a bit because there is other material interlaced, even
> > internal to some lines. Also: I removed the serial number of the
> > specific example device.)

 Thank you.  That presents a much clearer picture.
> > 
> >>I will further note that any kind of USB device cannot automatically
> >> be trusted to behave properly.  USB devices are notorious, for example,
> >> 
> >>   [reasons why deleted  --SB]
> >> 
> >>You should identify where you page/swap to and then try substituting
> >> a different device for that function as a test to eliminate the possibility
> >> of a bad storage device/controller.  If the problem still occurs, that
> >> means there still remains the possibility that another controller or its
> >> firmware is defective instead.  It could be a kernel bug, it is true, but
> >> making sure there is no hardware or firmware error occurring is important,
> >> and as I say, USB devices should always be considered suspect unless and
> >> until proven innocent.
> > 
> > [FYI: This is a ufs context, not a zfs one.]

 Right.  It's only a Pi, after all. :-)
> > 
> > I'm aware of such  things. There is no evidence that has resulted in
> > suggesting the USB devices that I can replace are a problem. Otherwise
> > I'd not be going down this path. I only have access to the one arm64
> > device (a Pine64+ 2GB) so I've no ability to substitution-test what
> > is on that board.

 There isn't even one open port on that hub that you could plug a
flash drive into temporarily to be the paging device?  You could then
try your tests before returning to the normal configuration.  If there
isn't an open port, then how about plugging a second hub into one of
the first hub's ports and moving the displaced device to the second
hub?  A flash drive could then be plugged in.  That kind of configuration
is obviously a bad idea for the long run, but just to try your tests it
ought to work well enough.  (BTW, if a USB storage device containing a
paging area drops off=line even momentarily and the system needs to use
it, that is the beginning of the end, even though it may take up to a few
minutes for everything to lock up.  You probably won't be able to do an
orderly