Re: Filesystem operations slower in 13.0 than 12.2

Kevin Oberman Mon, 15 Mar 2021 14:58:19 -0700

Responses in-line.

On Sun, Mar 14, 2021 at 3:09 PM Mark Millard <mark...@yahoo.com> wrote:


>
>
> On 2021-Mar-14, at 11:09, Kevin Oberman <rkoberman at gmail.com> wrote:
>
> > . . .
> >
> > Seems to only occur on large r/w operations from/to the same disk. "sp
> > big-file /other/file/on/same/disk" or tar/untar operations on large
> files.
> > Hit this today updating firefox.
> >
> > I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing other
> > things while it was running slowly, the disk would appear to lock up.
> E.g.
> > pwd(1) seemed to completely lock up the system, but I could still ping it
> > and, after about 30 seconds, things came back to life. It was also not
> > instantaneous. Disc activity dropped to <1MB/s for a few seconds before
> > everything froze.
> >
> > During the untar of firefox, I saw; this several times. I also looked at
> my
> > console where I found these errors during :
> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: 8192
> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: 4096
>
> Does anyone know:
> Are those messages normal "reading is taking a rather long
> time" notices or is their presence more useful information
> in some way about the type of problem or context for the
> problem?
>
> As for the tests:
> Are these messages always present when near a time frame
> when the problem occurs? Never present in a near time
> frame to a period when the problem does not occur?
>
In a large number of test, these errors have not repeated. They baffle me
for another reason. This system has 20G or RAM. Tyically, all swap space is
unused. ATM I see 16384M free out of 16384. Not sure that I have ever seen
it used, though it might have been while building rust. I have not built
rust for a month.

>
> It appears that the messages are associated with reading
> the disk(s), not directly with writing them, where the
> reads take more than "hz * 20" time units to complete.
> (I'm looking at main (14) code.) What might contribute
> to the time taken for the pending read(s)?
>
The reference to hz * 20 woke up a few sleeping memory cells. I forgot that
I cleaned up my loader.conf. It was largely a copy of the one on my
decade-old T520. I commented out "kern.hz=100". I don't recall the details,
but I think it was actually from an even older system, my T42 from before I
retired.

In any case, restoring this setting has greatly improved the situation. I
now have really bad disk I/O performance on large disk to disk activity
(untarring the firefox distro) instead of terrible performance and the
system freezes have vanished, though I do see pauses in response to clicks
or text entry, but the display remains active and the pauses are short... 1
to 15 seconds, I'd guess. No, I have no idea what this indicates.

I'm still not seeing the performance I was seeing back in February when 40
MB/s for extended intervals was common and I once untarred firefox.tar.gz2
in under a minute and performance seldom dropped below 1.4 MB/s.


>
> /*
>  * swap_pager_getpages() - bring pages in from swap
>  *
>  *      Attempt to page in the pages in array "ma" of length "count".  The
>  *      caller may optionally specify that additional pages preceding and
>  *      succeeding the specified range be paged in.  The number of such
> pages
>  *      is returned in the "rbehind" and "rahead" parameters, and they will
>  *      be in the inactive queue upon return.
>  *
>  *      The pages in "ma" must be busied and will remain busied upon
> return.
>  */
> static int
> swap_pager_getpages_locked(vm_object_t object, vm_page_t *ma, int count,
>     int *rbehind, int *rahead)
> {
> . . .
>         /*
>          * Wait for the pages we want to complete.  VPO_SWAPINPROG is
> always
>          * cleared on completion.  If an I/O error occurs, SWAPBLK_NONE
>          * is set in the metadata for each page in the request.
>          */
>         VM_OBJECT_WLOCK(object);
>         /* This could be implemented more efficiently with aflags */
>         while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) {
>                 ma[0]->oflags |= VPO_SWAPSLEEP;
>                 VM_CNT_INC(v_intrans);
>                 if (VM_OBJECT_SLEEP(object, &object->handle, PSWP,
>                     "swread", hz * 20)) {
>                         printf(
> "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n",
>                             bp->b_bufobj, (intmax_t)bp->b_blkno,
> bp->b_bcount);
>                 }
>         }
>         VM_OBJECT_WUNLOCK(object);
> . . .
>
> where:
>
> #define VM_OBJECT_SLEEP(object, wchan, pri, wmesg, timo)                \
>         rw_sleep((wchan), &(object)->lock, (pri), (wmesg), (timo))
>
> and:
>
> #define rw_sleep(chan, rw, pri, wmesg, timo)                            \
>         _sleep((chan), &(rw)->lock_object, (pri), (wmesg),              \
>             tick_sbt * (timo), 0, C_HARDCLOCK)
>
> (I do not claim to be able to interpret the implications
> of the code that leads to the messages. But seeing some
> of the code might prompt a thought by someone that knows
> the code's context and operation.)
>
> > . . .
> > Backing off to Mar. 4 was not an improvement. My untar did seem better
> for a couple of minutes, but then the display froze again for 30 seconds
> and disk performance dropped to <1M.
>
> You were able to see the disk performance drop while
> the display was frozen?
>

No, but it refreshed the display when it un-froze and the refreshed display
showed a one-minute history that showed that data was still transferring
during the pause.

>
> It might not be the best for monitoring but I'll ask
> this in terms of top output: Does Inact, Laundry,
> Wired, Free, or other such show anything fairly unique
> for around the problematical time frame(s)?
>

These all look pretty normal. Free memory stays at 13G throughout hte
operatioin.

>
> > then things got really bad and behaved in a manner that was baffling to
> me. The screen froze again, but stayed frozen after half a minute. I
> clicked on a couple of buttons in Firefox to no effect and then hit ctrl-q
> to quit. After the long pause, I pressed the power button to try to force a
> shutdown. Suddenly, it started unwinding everything I had done during the
> freeze. My browser did the updates from my mouse clicks including quitting.
> It then switched to a different workspace from ctrl-alt-right and did a
> clean shutdown. ????
> >
> > Do I also have a graphics issue? Examining log files show no indication
> that anything was happening. SMART shows no errors and reasonable values
> for everything. No indication of a HW problem. The system performs well
> unless I do something that tries a bulk disk data move. Building world
> takes about 75 minutes. I just have a very hard time building big ports.
>
> Almost like things were stuck-sleeping and then the
> sleep(s) finished?
>
Exactly!


>
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
>
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Filesystem operations slower in 13.0 than 12.2

Reply via email to