> What I don't like is that you never have given details (even when
> requested) on your extremely slow original fsck which started this
> thread. The last couple of years I tested fsck on many different
> setups, but I never saw fsck times of 4 hours and not even finished.
> So there's something special about your setup. It's likely that bigmem
> plays a role, but you only mention it now. That's not the way to do
> proper problem anlysis.
>
> And jumping up and down after a first successful test is not a sound
> engineering principle either.
>
>        -Otto

Otto,

I am sorry that I overlooked giving any details. Here goes.

Ok here goes, my fs layout is as follows. Some fs are now a mix of
newfs with -b 64K -f 8k and some with -b 16k -f 2k (I changed /tmp
/usr/obj /usr/xobj and /personal over to the bumped up newfs values).
I will change them all back to the default values over this weekend, I
prefer defaults.

# df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/wd0a     1005M   73.4M    881M     8%    /
/dev/wd0d      4.0G   40.0K    3.8G     0%    /tmp
/dev/wd0e      9.3G   16.9M    8.9G     0%    /var
/dev/wd0f      5.9G    1.6G    4.0G    29%    /usr
/dev/wd0g     1008M    169M    789M    18%    /usr/X11R6
/dev/wd0h     11.8G    1.3G    9.9G    12%    /usr/local
/dev/wd0i      5.9G    770M    4.9G    13%    /usr/src
/dev/wd0j      4.0G    1.5M    3.8G     0%    /usr/obj
/dev/wd0k      4.0G    8.0K    3.8G     0%    /usr/xobj
/dev/wd0l     19.7G    110M   18.6G     1%    /home
/dev/wd0m     39.4G    1.3G   36.1G     4%    /extra
/dev/wd0n     39.8G    9.2G   28.6G    24%    /personal
/dev/wd0o     81.1G   94.0M   76.9G     0%    /downloads
/dev/sd0a      231G   13.3G    206G     6%    /datamir


The original problem I faced was here
http://marc.info/?l=openbsd-misc&m=129900971428196&w=2

I had turned on bigmem slightly just before this debacle happened.

It turns out that I had extracted into the default firefox download
location (/home/amit/downloads I forgot exactly where) all kinds of
files. There was sources for gdb 6.3, 6.6, 6.7, 6.8. GCC 3.3, 3.4.6,
4.5 etc, LLVM + Clang 2.8. Still more that I forget. This is a 20 Gb
fs and I was totally unaware I was abusing my fs so much. The day this
happened, I had updated src,ports,xenocara,www from cvs. I immediately
did a plain "fsck" right after this operation. I typed fsck in the
same window while it was updating ports. On hindsight, I might have
waited till it had finished writing the cache to disk. The fsck
proceeded well until it encountered the gazillion files in /home.
Being naive I expected it to complete in 1 hr at the most. Here I am
staring at the screen, and the machine is completely unresponsive. A
keypress is taking a long time. I found out during this unfortunate
time that OpenBSD kernel is not pre-emptive, I/O goes on
uninterrupted. I did a pkill, kill -9, to no avail. I tried logging
into virtual terminal 2, before giving up after 4-5 hours. I was
reading the FAQ and googling which said fsck needs more memory. I
didn't think it would apply to my case. I didn't even think that
bigmem had anything to do with fsck. This machine is 8GB RAM with 2 X
dual core Opteron. So there is no memory issue...

The machine went into heavy I/O load, I could tell that much. Hard
disk spinning like crazy. (Btw, now I know how to do a cleaner
shutdown while hitting the power button on OpenBSD) So next day or so,
I went into single user and marked all fs clean except /home. After
some more time in fsck, it struck me that I might have lots of files,
so again a power cycle and then marking /home clean and then rm -rf. I
learnt so many things here due to this experience...such as you have
to let softupdates settle for 30 seconds after heavy I/O before it
flushes its cache to disk.

Anyway, after rm -rf in /home. I experimented with fsck'ing the / fs,
it was quick. Then experimented with fsck of /usr/xobj , /usr/obj
basically in increasing order of fs use to find where fsck was
hanging. It still was taking time and it was reasonable till 2-4G,
then it just went crazy. For kicks I did a fsck on a huge unused
partition of 160 Gb and it also was taking time. I made sure there was
nothing there in that huge 160GB partition, did a rm -rf. I trimmed
the /home to what you see now, and shifted my downloads from
Chrome/Firefox to /downloads. The huge partition of 160 GB which was
unused, just as recommended in FAQ, I broke it up into three
/downloads /personal /extra. I removed XFCE, Gnome to cut the fat and
went back to default FVWM. And from then I made the statement that
above the 12GB /usr/local it was crazy to contemplate doing fsck,
because it just wouldn't proceed.

This story is from memory so it will be inconsistent. Anyway art@
fixed it. And I guess if I ran fsck on a heavily loaded fs now with
his fix, fsck would run much faster. And with your fixes it will be
blindingly fast. Any help needed in testing I am willing to do it, so
that it would help get the fix in for everybody. It isn't good to have
patches lying around, they get forgotten. I understand this is risky
but this is really really great.

Jumping up and down is a great engineering principle, especially when
you know the pain is gone :-)

Thanks,
amit

Reply via email to