Re: [OmniOS-discuss] ZPOOL bug after upgrade to r151020

John Barfield Tue, 18 Apr 2017 07:54:21 -0700

I did compile it. It seems to have fixed the deadlocsk but we’re still having 
performance issues that we were not having before. IMHO this is some type of 
regression.

Here is the binary location:

http://download.txtelecom.net/zfs

Install procedure (As root):
(Props to Dan McDonald for helping me with this process)

·         beadm list (Note your current BE)

·         zfs snap rpool@WHATEVER

·         beadm create OmniOS-r151020-cstm

·         beadm mount OmniOS-r151020-cstm /mnt

·         cd /mnt/kernel/fs/amd64/

·         mv zfs zfs.old

·         wget download.txtelecom.net/zfs

·         cd ~

·         beadm unmount /mnt

·         beadm activate OmniOS-r151020

·         shutdown –i6 –g0

The system will reboot twice…the first SAN I did this on rebooted both times 
flawlessly. On the Second system it froze while shutting down and we had to 
power cycle it. But it still came back online using the new build. Use this at 
your own risk.

If it does fail completely simply choose your old BE on next reboot within 
grub. (Step 1 you’ll have a list)

However, please see the following comments from my customer which are still 
occurring:

John,
I’ve copied 6GB, 37GB, and 61GB files from the thor /mnt local disk to a 
location on the San.
Monitoring only shell activity on a few machines, nothing locked up (except 
thor). Although trying a ssh to thor took a minute to connect.
While the ‘cp’ command was working, I monitor thor using ‘top’ command.
* Of course, the load factor started to rise from 1 to over 5 on thor.
* As the file was copying, I see the destination file size increase.
* However, when the destination file size reached 6 or 37 or 61GB, thor’s load 
factor is still very high and remained high for a while.
* If logged into thor, no work can be done while the ‘cp’ is ongoing.
* Only after the ‘cp’ command was completed did thor’s load factor start to 
drop and work could be done if logged into thor.
* Other machines, e.g. flash, linux8, had no issues. . . as far as I could tell.
* I did not invoke any of the EDA tools.

Additionally the VMware datastore on NFS(ZFS)  bug that I was hoping to fix was 
only ½ resolved as well.

We can delete large files now but it brings the SAN to a crawl. Better than it 
used to be when it would come to a halt.

I’d love to see if I could get a kernel hacker to look at both of these issues 
with us.

John Barfield
Engineering and Stuff

M: +1 (214) 425-0783  O: +1 (214) 506-8354
john.barfi...@bissinc.com<mailto:john.barfi...@bissinc.com>

[cid:image001.png@01D2B829.5033AEE0]
4925 Greenville Ave, Ste 900
Dallas, TX 75206

For Support Requests:
http://support.bissinc.com<http://support.bissinc.com/> or 
supp...@bissinc.com<mailto:supp...@bissinc.com>

Follow us on Twitter for Network Status &  Updates!

[cid:image002.gif@01D2B829.5033AEE0]<https://twitter.com/johnbarfield>

From: wuffers <m...@wuffers.net>
Date: Tuesday, April 18, 2017 at 9:38 AM
To: Dan McDonald <dan...@omniti.com>
Cc: John Barfield <john.barfi...@bissinc.com>, 
"omnios-discuss@lists.omniti.com" <omnios-discuss@lists.omniti.com>
Subject: Re: [OmniOS-discuss] ZPOOL bug after upgrade to r151020

I upgraded to r151020 in late Jan, and saw some strangeness with arcstat 
(l2size and l2asize were huge) before I did a reboot due to some instability a 
few weeks ago. I thought it was just a case of not using the latest arcstat, 
and things were running fine after a reboot so didn't pursue it.

I saw this post last week, and confirmed it was within my environment, so did 
the remove/re-add of the cache devices, then a complete reboot as well. The 
cache devices reported back their actual proper size (400GB) via "zpool iostat 
-v". Today I checked it again and this is what I see:

# arcstat
read  hits  miss  hit%  l2read  l2hits  l2miss  l2hit%  arcsz  l2size  l2asize
 465   412    53    88      53      50       3      94   230G    4.4T     3.2T

# zpool iostat -v

(other info snipped for brevity)

cache                          -      -      -      -      -      -
  c2t500117310015D579d0     816G  16.0E     54     23  2.32M  1.46M
  c2t50011731001631FDd0     816G  16.0E     54     23  2.32M  1.46M
  c12t500117310015D59Ed0    815G  16.0E     55     23  2.35M  1.46M
  c12t500117310015D54Ed0    816G  16.0E     55     23  2.36M  1.46M

I'm just waiting for the next lockup/crash..

John, were you able to compile the fix, and if so, be able send me a copy?

Thanks.

On Mon, Apr 10, 2017 at 10:00 AM, Dan McDonald 
<dan...@omniti.com<mailto:dan...@omniti.com>> wrote:

> On Apr 9, 2017, at 10:27 PM, John Barfield 
> <john.barfi...@bissinc.com<mailto:john.barfi...@bissinc.com>> wrote:
>
> Thank you Dan.
>
> Do you happen to have the process or know the location of a process document 
> for only building ZFS?
>
> Ive re-built only nfs from illumos-gate in the past to resolve a bug but im 
> wondering how I would build and install only zfs. (if its even possible).
>
> There are 2 bugs that we're suffering with at two different customer sites 
> that didnt get into r151020 and Im not sure that we can make it till r151022 
> is released.
>
> Thanks for any advice

You can build zfs the way you likely built NFS.  Build it, replace it on an 
alternate BE (in zfs's case:  /kernel/fs/amd64/zfs), and reboot.

The only gotcha might be if a bugfix covers more than just ZFS itself... but 
for 7504, that's NOT the case.  :)

Dan

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZPOOL bug after upgrade to r151020

Reply via email to