On Thu, Jul 08, 2010 at 01:26:39PM +0200, Benny L?fgren wrote:
> Hi,
>
> Sorry about the tab mangling. How should I best send in diffs and
> communicate with developers in the future? (Tried to find something
> about best practice in the usual places, but failed so I winged it
> this time.)
Inline unified diffs using a mail agent that does not mangle tabs...
>
> Tested your diff, afaict it works fine. Thanks!
>
> Regarding your comment about large filesystems - yes I usually do
> this as well. (Although the time to check is still negligible
> compared to the time it takes to reconstruct parity on a 10 TB
> RAIDframe partition - we're talking days... :-) ) A more serious
> problem is of course that fsck_ffs might run out of address space,
> but since I'm only running amd64 these days, that problem has been
> postponed to some distant future. :-)
Recent newfs warns if it's estimate to do a fsck is larger than
MAXDSIZE or the amount of physical mem.
>
> However, when I tried to newfs an even larger partition (8,2 TB)
> than the one I used for this bug (which by the way wasn't >4 TB, it
> was 3 TB so the bug heading should probably have stated >2TB
> instead. Sorry about that!), I managed to combine fragment and block
> sizes in such a way that I got a panic the first time I was creating
> a directory on the new file system.
>
> I'll try to reproduce that problem and send a PR about that too. I
> probably did something stupid, but either newfs should warn about
> that or there is actually a bug in the newfs or ffs2 code, in which
> case it needs to be more resilient.
Yes, please try to gather more info on this. Youmight be creating an
fs with more than 2^32 inodes. newfs should catch that.
>
> Still have some time to play with this particular system until it is
> supposed to go into production, so I'll try to weed out as many
> problems with very large arrays/partitions/file systems/files, if
> any, that I can.
In the meantime I also confirmed this on a smaller system (2T) but
with 4k blocks and 512 bytes fragments.
I suspect the hang you mentioned and which I also saw on a unpatched
system is caused by the kernel trying to coredump the giant process,
which hangs the machine. This is of cousre a completely different bug.
BTW, keep Cc:'ing bugs@, more pepole are interested in this.
-Otto