CPU on yesterday's -CURRENT

Jeff Roberson Sun, 11 Mar 2018 13:48:27 -0700

On Sun, 11 Mar 2018, O. Hartmann wrote:

Am Wed, 7 Mar 2018 14:39:13 +0400
Roman Bogorodskiy <[email protected]> schrieb:

  Danilo G. Baio wrote:

On Tue, Mar 06, 2018 at 01:36:45PM -0600, Larry Rosenman wrote:

On Tue, Mar 06, 2018 at 10:16:36AM -0800, Rodney W. Grimes wrote:

On Tue, Mar 06, 2018 at 08:40:10AM -0800, Rodney W. Grimes wrote:

On Mon, 5 Mar 2018 14:39-0600, Larry Rosenman wrote:

Upgraded to:

FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #11 r330385:
Sun Mar  4 12:48:52 CST 2018
[email protected]:/usr/obj/usr/src/amd64.amd64/sys/VT-LER  amd64
+1200060 1200060

Yesterday, and I'm seeing really strange slowness, ARC use, and SWAP use
and swapping.

See http://www.lerctr.org/~ler/FreeBSD/Swapuse.png


I see these symptoms on stable/11. One of my servers has 32 GiB of
RAM. After a reboot all is well. ARC starts to fill up, and I still
have more than half of the memory available for user processes.

After running the periodic jobs at night, the amount of wired memory
goes sky high. /etc/periodic/weekly/310.locate is a particular nasty
one.


I would like to find out if this is the same person I have
reporting this problem from another source, or if this is
a confirmation of a bug I was helping someone else with.

Have you been in contact with Michael Dexter about this
issue, or any other forum/mailing list/etc?

Just IRC/Slack, with no response.


If not then we have at least 2 reports of this unbound
wired memory growth, if so hopefully someone here can
take you further in the debug than we have been able
to get.

What can I provide?  The system is still in this state as the full backup is
slow.


One place to look is to see if this is the recently fixed:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222288
g_bio leak.

vmstat -z | egrep 'ITEM|g_bio|UMA'

would be a good first look

borg.lerctr.org /home/ler $ vmstat -z | egrep 'ITEM|g_bio|UMA'
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
UMA Kegs:               280,      0,     346,       5,     560,   0,   0
UMA Zones:             1928,      0,     363,       1,     577,   0,   0
UMA Slabs:              112,      0,25384098,  977762,102033225,   0,   0
UMA Hash:               256,      0,      59,      16,     105,   0,   0
g_bio:                  384,      0,      33,    1627,542482056,   0,   0
borg.lerctr.org /home/ler $

Limiting the ARC to, say, 16 GiB, has no effect of the high amount of
wired memory. After a few more days, the kernel consumes virtually all
memory, forcing processes in and out of the swap device.


Our experience as well.

...

Thanks,
Rod Grimes
[email protected]

Larry Rosenman                     http://www.lerctr.org/~ler


--
Rod Grimes                                                 [email protected]


--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: [email protected]
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106



Hi.

I noticed this behavior as well and changed vfs.zfs.arc_max for a smaller size.

For me it started when I upgraded to 1200058, in this box I'm only using
poudriere for building tests.


I've noticed that as well.

I have 16G of RAM and two disks, the first one is UFS with the system
installation and the second one is ZFS which I use to store media and
data files and for poudreire.

I don't recall the exact date, but it started fairly recently. System would
swap like crazy to a point when I cannot even ssh to it, and can hardly
login through tty: it might take 10-15 minutes to see a command typed in
the shell.

I've updated loader.conf to have the following:

vfs.zfs.arc_max="4G"
vfs.zfs.prefetch_disable="1"

It fixed the problem, but introduced a new one. When I'm building stuff
with poudriere with ccache enabled, it takes hours to build even small
projects like curl or gnutls.

For example, current build:

[10i386-default] [2018-03-07_07h44m45s] [parallel_build:] Queued: 3  Built: 1  
Failed:
0  Skipped: 0  Ignored: 0  Tobuild: 2   Time: 06:48:35 [02]: security/gnutls
| gnutls-3.5.18             build           (06:47:51)

Almost 7 hours already and still going!

gstat output looks like this:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0  da0
    0      1      0      0    0.0      1    128    0.7    0.1  ada0
    1    106    106    439   64.6      0      0    0.0   98.8  ada1
    0      1      0      0    0.0      1    128    0.7    0.1  ada0s1
    0      0      0      0    0.0      0      0    0.0    0.0  ada0s1a
    0      0      0      0    0.0      0      0    0.0    0.0  ada0s1b
    0      1      0      0    0.0      1    128    0.7    0.1  ada0s1d

ada0 here is UFS driver, and ada1 is ZFS.

Regards.
--
Danilo G. Baio (dbaio)




Roman Bogorodskiy



This is from a APU, no ZFS, UFS on a small mSATA device, the APU (PCenigine) 
works as a
firewall, router, PBX):

last pid:  9665;  load averages:  0.13,  0.13,  0.11
up 3+06:53:55  00:26:26 19 processes:  1 running, 18 sleeping CPU:  0.3% user,  
0.0%
nice,  0.2% system,  0.0% interrupt, 99.5% idle Mem: 27M Active, 6200K Inact, 
83M
Laundry, 185M Wired, 128K Buf, 675M Free Swap: 7808M Total, 2856K Used, 7805M 
Free
[...]

The APU is running CURRENT ( FreeBSD 12.0-CURRENT #42 r330608: Wed Mar  7 
16:55:59 CET
2018 amd64). Usually, the APU never(!) uses swap, now it is starting to swap 
like hell
for a couple of days and I have to reboot it failty often.

Another box, 16 GB RAM, ZFS, poudriere, the packaging box, is right now 
unresponsible:
after hours of building packages, I tried to copy the repository from one 
location on
the same ZFS volume to another - usually this task takes a couple of minutes 
for ~ 2200
ports. Now, I has taken 2 1/2 hours and the box got stuck, Ctrl-T  on the 
console
delivers:
load: 0.00  cmd: make 91199 [pfault] 7239.56r 0.03u 0.04s 0% 740k

No response from the box anymore.


The problem of swapping like hell and performing slow isn't an issue of the 
past days, it
is present at least since 1 1/2 weeks for now, even more. Since I build ports 
fairly
often, time taken on that specific box has increased from 2 to 3 days for all 
~2200
ports. The system has 16 GB of RAM, IvyBridge 4-core XEON at 3,4 GHz, if this 
information
matters. The box is consuming swap really fast.

Today is the first time the machine got inresponsible (no ssh, no console login 
so far).
Need to coldstart. OS is CURRENT as well.

Regards,

O. Hartmann


Hi Folks,

This could be my fault from recent NUMA and concurrency related work. Idid touch some of the arc back-pressure mechanisms. First, I would liketo identify whether the wired memory is in the buffer cache. Can those ofyou that have a repro look at sysctl vfs.bufspace and tell me if thataccounts for the bulk of your wired memory usage? I'm wondering if a jobran that pulled in all of the bufs from your root disk and filled up thebuffer cache which doesn't have a back-pressure mechanism. Then arcdidn't respond appropriately to lower its usage.

Also, if you could try going back to r328953 or r326346 and let me know ifthe problem exists in either. That would be very helpful. If anyone iswilling to debug this with me contact me directly and I will send sometest patches or debugging info after you have done the above steps.


Thank you for the reports.

Jeff



--
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).

_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[email protected]"

Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

Reply via email to