[A normal multi-user boot's fsck activity can do
fsck -B activity that gets the problem.]
On 2017-Jul-8, at 9:45 AM, Mark Millard <markmi at dsl-only.net> wrote:
> [I add notes about a problem that happens after the
> "fsck -B". Also forgot to mention: production style
> kernel world builds were in use. And a tried a
> powerpc64 build and it works the same.]
> On 2017-Jul-7, at 11:09 PM, Mark Millard <markmi at dsl-only.net> wrote:
>> [This note has more information than one sent with extra text
>> in the subject but with a partially different "to" list.]
>> Peter Jeremy peter at rulingia.com wrote on
>> Sat Jul 8 02:00:47 UTC 2017 :
>>> When did you first notice this (what SVN revision)?
>>> Do you know what the last good SVN revision was?
>>> Is this a new or old filesystem?
>>> Is the filesystem mounted/active or not when you dump it?
>>> What are the relevant parameters for the filesystem on ada0s3a?
>>> Are you running softupdates, journalling etc?
>>> Which dump(8) phase is reporting the errors?
>>> What are the exact dump and fsck commands you ran?
>> I can add a little information with some contrast
>> and only "fsck -B" in use (with an unclean file
>> system from a prior crash), no dump use. Still:
>> a snapshot is involved in the below.
>> Unfortunately two problems with major consequences
>> for my involved context limit the svn range that I
>> can cover for the activity, the problem version
>> ranges being:
>> -r319722 through -r320651 (fixed by -r320652)
>> (actually this is why I had used "boot -s"
>> in what I report later: I could get to a
>> shell prompt that way instead of crashing
>> before any login prompt; the crashes left
>> the file system in need of repair)
>> -r320509 through -r320561 (fixed by -r320570)
>> So I was using -r320570 to avoid one of the
>> two problems.
>> Context: 32-bit powerpc FreeBSD used on PowerMac G5
>> so-called "Quad-core". (So big-endian as well.)
>> Softupdates, no journalling. Long-in-use file
>> system having lots of FreeBSD versions updates
>> and port rebuilds over the time.
>> The following is from now, not from the time of the
>> example messages:
>> # dumpfs / | more
>> magic 19540119 (UFS2) time Fri Jul 7 22:53:34 2017
>> superblock location 65536 id [ <OMITTED> ]
>> ncg 158 size 25165823 blocks 24372006
>> bsize 32768 shift 15 mask 0xffff8000
>> fsize 4096 shift 12 mask 0xfffff000
>> frag 8 shift 3 fsbtodb 3
>> minfree 8% optim time symlinklen 120
>> maxbsize 32768 maxbpg 4096 maxcontig 4 contigsumsize 4
>> nbfree 2130375 ndir 65518 nifree 11769796 nffree 425065
>> bpg 20032 fpg 160256 ipg 80128 unrefs 0
>> nindir 4096 inopb 128 maxfilesize 2252349704110079
>> sbsize 4096 cgsize 32768 csaddr 5048 cssize 4096
>> sblkno 24 cblkno 32 iblkno 40 dblkno 5048
>> cgrotor 127 fmod 0 ronly 0 clean 0
>> metaspace 6408 avgfpdir 64 avgfilesize 16384
>> flags soft-updates trim
>> fsmnt /
>> volname FBSDG4Srootfs swuid 0 providersize 25165823
>> . . .
>> What I had done that produced the messages was:
>> <Prior failed multi-user boot from system problem
>> leaves root (only) file system not marked clean
>> so fsck -B will actually do something below>
>> boot -s (so: single user mode)
>> # The next 3 lines are the content of a generic, manually-run script.
>> mount -u /
>> mount -a -t ufs (but there is no other file system)
>> swapon -a (there is a swap partition)
>> fsck -B
>> That "fsck -B" caused the same kinds of lines
>> reported by Michael Butler, happening as fsck
>> makes a snapshot for the background processing
>> to use. (I have camera pictures and could type
>> in some of the lines if needed.)
>> After those lines was text like (typed in from
>> an example camera picture):
>> ** //.snap/fsck_snapshot
>> ** Last Mount on /
>> ** Root file system
>> ** Phase 1 - Check Blocks and Sizes
>> ** Phase 2 - Check Pathnames
>> ** Phase 3 - Check Connectivity
>> ** Phase 4 - Check Reference Counts
>> ** Phase 5 - Check Cyl groups
>> Reclaimed: 0 directories, 1 files, 22680 fragments
>> 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks,
>> 1.8% fragmentation)
>> ***** FILE SYSTEM MARKED CLEAN *****
> [I forgot or mention that the context was a
> production style kernel and world build,
> no invariants or other such.]
> Since I'm running a patched -r320570 for the
> -r319722 through -r320651 (fixed by -r320652)
> I went back and forced a power-off without
> shutdown and did the sequence:
> boot -s (so: single user mode)
> # The next 3 lines are the content of a generic, manually-run script.
> mount -u /
> mount -a -t ufs (but there is no other file system)
> swapon -a (there is a swap partition)
> fsck -B
> but always waited briefly after the fsck -B finished.
> Like before the following happens as it tries to trim:
> (typed in from camera picture)
> panic: ffs_blkfree_cq: freeing free block
> cpuid = 2 (varies, of course)
> time = (varies)
> KDB: stack backtrace
> (stack addresses can vary: just an example here)
> 0xd23b17e0: at kdb_backtrace+0x5c
> 0xd23b1850: at vpanic+0x1e8
> 0xd23b18c0: at panic+0x54
> 0xd23b1910: at ffs_blkfree_cq+0x278
> 0xd23b1980: at ffs_blkfree_trim_task+0x60
> 0xd23b19b0: at taskqueue_run_locked+0x10
> 0xd23b1a10: at taskqueue_thread_loop+0x174
> 0xd23b1a50: at fork_exit+0xf4
> 0xd23b1a80: at fork_trampoline+0xc
> KDB: enter: panic
> [ thread pid 0 tid 1000082 ]
> Stopped at kdb_enter_0x70: addi r0,r0,0x0
> I've tried this on a powerpc64 and it works
> the same, complete with the "freeing free
> block" issue.
I tried a sequence using a normal boot to multi-user
that was not clean but did a automatic fsck -B and
I got the messages and the later "freeing free block"
It appears that having mksnap_ffs (and code equivalents
in other programs) broken in turn breaks fsck -B fairly
majorly. (Michael Butler did the mksnap_ffs test at
Rodney W. Grimes request.)
I've been using the following to clean things up
when I'm done with an experimental sequence that
leaves things needing a fsck:
boot -s (a single user boot)
So far it has resulted in a clean file system.
With that status fsck -B then has no such
problem: apparently it then does not create
a snaphot by default. So then a multi-user boot
works okay for its automatic fsck use.
markmi at dsl-only.net
email@example.com mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"