Hello again,

Sorry for the delay in reply, spent the weekend monitoring and
troubleshooting the box. Ever since the panic/reboot, its performance has
been stellar, so I think we can rule out the "new hardware" angle.

I have been going over the maintenance procedure that Juniors did on Sunday
 the 12th, week starting the spikes and issues.

Maintenance was to replacing old hardware, with this new hardware.

We build a complete new system and create the same exported file systems as
live, for testing purposes.

Procedure is a little bit like:

* create dataset/mountpoint "/export/www", "/export/cgi" (and 4 more) and
NFS export
* test system (NFS server, clients etc)

week before maintenance:

* zfs send real-live data to NFS server as /export/REAL_www
/export/REAL_cgi  (and the rest ... )

On the night of maintenance:

* rename test /export/www to /export/deleteme-www (and the rest ... )
* rename live /export/REAL_www to /export/www (and the rest ... )

Here they did the first mistake and forgot about the "mountpoint" setting,
so first reboot (the panic) it stopped at "filesystem/local" as both
/export/deleteme-www and new /export/www have same mountpoint. I fixed this
at panic boot time.

* NFS export new /export/www (and the rest ... )
* remount NFS on clients

My best guess here is they never "unshared" NFS, nor unmounted it on
clients. And simply shared the new /export/www (etc). Entering a situation
where two shares where named "/export/www". This is confirmed from bash
history, and zpool history.

I would have expected the "zfs rename zpool1/www zpool1/delete_www" command
to fail, as it was shared (and unable to re-share due to NFS clients
mounting it).

It will be trivial to enhance the maintenance procedure to avoid this
problem in future, it was the juniors first maintenance so I don't think
"unsharing" is part of their "muscle-memory".

I have a feeling managers will ask me to replicate this issue on test
hardware, and confirm I can make NFS sharing confused with renames. Or at
the very least, find the real problem, if this isn't it.

But since the server is running just as well as its many siblings, the
pressure is off, at least.

Lund


Jorgen Lundman wrote:
> 
> Hello experts,
> 
> We have quite a number of NFS servers running OmniOS, but the very latest
> hardware is giving us some grief and I was hoping to get some assistance in
> finding out why.
> 
> 
> SunOS nfs02 5.11 omnios-b5093df i86pc i386 i86pc
>   OmniOS v11 r151016
> 
> MB : MBD-X10DRH- iT  (Xeon Supermicro)
> CPU: E5-2650V4 2.2GHz x 2 (48 cores)
> Mem: Intel Xeon BDW-EP DDR4-2400 ECC REG 32GB x 12 (384G)
> 
> ZFS pool of 24 HDDs, serving NFSv4 clients.
> 
> This is the first server of this hardware type, and the first we are
> experiencing troubles with. The older servers are generally 32 core.
> 
> 
> 
> 
> In normal situations, the load is higher than expected (at least compared
> to what the load was on Solaris 10 that we are replacing.) But possibly it
> is just that the loadavg math has changed.
> 
> last pid:  3198;  load avg:  5.83,  7.31,  8.29;  up 0+00:40:39        
> 15:10:30
> 63 processes: 62 sleeping, 1 on cpu
> CPU states: 89.3% idle,  0.0% user, 10.7% kernel,  0.0% iowait,  0.0% swap
> Kernel: 97979 ctxsw, 20 trap, 96827 intr, 208 syscall
> Memory: 384G phys mem, 337G free mem, 4096M total swap, 4096M free swap
> 
>    PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>    935 daemon    666  60  -20 9544K 8740K sleep  221:20  7.52% nfsd
>    933 root       15  59    0 5328K 3476K sleep    0:02  0.00% mountd
>   1666 root        1  59    0   73M   71M cpu/24   0:02  0.00% top
>    318 root       34  59    0 8812K 5448K sleep    0:01  0.00% nscd
> 
> (Recently rebooted as it panicked. Alas, no information on why in logs nor
> dump - looks especially evil at the moment).
> 
> 
> Then from time to time, it goes crazy, loads goes over 50, nfsd threads
> drop to about 120. All NFS clients spew messages regarding NR_BAD_SEQID and
> NFS4ERR_STALE.
> 
> Sometimes it recovers, sometimes it reboots. It has been armed with dump
> now, in case it crashes again.
> 
> During idle time, flamegraph stacks are mostly in unix`acpi_cpu_cstate and
> i86_mwait.
> 
> ( flamegraph here: http://www.lundman.net/nfs02-idle.svg )
> 
> During the last load 50, flamegraph showed it to be busy in
> rfs4_findstate_by_owner_file > rfs4_dbsearch > vmem_nextfit_alloc.
> 
> ( flamegraph here: http://www.lundman.net/nfs02-busy.svg )
> 
> Although, considering how much memory is free (337G) should it be blocking
> there?
> 
> I've been trying to find anything of interest on the server, but I'm unsure
> what it going on. I have gone through many of the DTraceToolkit tools as
> well. Request any output wanted!
> 
> r151016 is a bit old, especially on newest hardware, but going through the
> illumos commit log, I only found "7912 nfs_rwlock readers are running wild
> waiting" in the nfs area.
> 
> 
> Cheers,
> 
> Lund
> 
> 

-- 
Jorgen Lundman       | <[email protected]>
Unix Administrator   | +81 (0)90-5578-8500
Shibuya-ku, Tokyo    | Japan


------------------------------------------
illumos-discuss
Archives: 
https://illumos.topicbox.com/groups/discuss/discussions/T1f149f6156a80f52-M4938f13a9caa07f4848f7460
Powered by Topicbox: https://topicbox.com

Reply via email to