Hello Again

Some more details, now that I scraped a few minutes on the weekend to
look into this.
Same hardware configuration as my earlier post; with a newer bios
version on the motherboard; with no improvement.

Some long winded test results, and my conclusions:
[abridged] output from diff for one flawed copy:
===
linux/fs/cramfs/inflate/inffixed.h
-    {{{84,7}},99}, {{{0,8}},127}, {{{0,8}},63}, {{{0,9}},223},
+    {{{84,7}},99}, {{{0,8}},127}{0,8}},63}, {{{0,9}},223},
linux/fs/jffs/intrep.c
-                       jffs_fmfree_partly(fmc, fm, total_data_size);
-                       jffs_fm_write_unlock(fmc);
+                       jffs_fmfree_partly(fmc, fm,
total_data_sizport                  jffs_fm_write_unlock(fmc);
linux/kernel/resource.c
-int allocate_resource(struct resource *root, struct resource *new,
+int allocate_rrce(struct resource *root, struct resource *new,
===
Closer inspection of the three "corrupt" files, using the command
od -tc <file> | less
revealed that in all cases, the corruption was the four bytes
immediately preceding an exact multiple of 4096 - the block size for the
(ext2) fs...
I may well go back and read the "corruption" thread which gave me the
idea for the comparison when I wake up :(
for inffixed.h, the correct dump reads
0017740   {   {   {   8   4   ,   7   }   }   ,   9   9   }   ,       { 
0017760   {   {   0   ,   8   }   }   ,   1   2   7   }   ,       {   { 
0020000   {   0   ,   8   }   }   ,   6   3   }   ,       {   {   {   0 
0020020   ,   9   }   }   ,   2   2   3   }   ,  \n                   { 
and the flawed dump reads
0017740   {   {   {   8   4   ,   7   }   }   ,   9   9   }   ,       { 
0017760   {   {   0   ,   8   }   }   ,   1   2   7   }  \0  \0  \0  \0 
0020000   {   0   ,   8   }   }   ,   6   3   }   ,       {   {   {   0 
0020020   ,   9   }   }   ,   2   2   3   }   ,  \n                   { 
0x0020000 -> 131072 -> 32 x 4k

likewise for intrep.c
0177740   r   t   l   y   (   f   m   c   ,       f   m   ,       t   o 
0177760   t   a   l   _   d   a   t   a   _   s   i   z   e   )   ;  \n 
0200000  \t  \t  \t   j   f   f   s   _   f   m   _   w   r   i   t   e 
0200020   _   u   n   l   o   c   k   (   f   m   c   )   ;  \n  \t  \t 
and the flawed dump reads
0177740   r   t   l   y   (   f   m   c   ,       f   m   ,       t   o 
0177760   t   a   l   _   d   a   t   a   _   s   i   z   p   o   r   t 
0200000  \t  \t  \t   j   f   f   s   _   f   m   _   w   r   i   t   e 
0200020   _   u   n   l   o   c   k   (   f   m   c   )   ;  \n  \t  \t 
0x0200000 -> 2097152 -> 512 x 4k


I have a couple more examples; the corruption is still present after a
reboot; but I have yet to see what fsck makes of it...
[Addendum: corruption is still present after fsck]

So, in summary; when using the HPT366 controller with my claimed ATA66
drive; using an SMP kernel with two Celeron 466's (at 466), under load I
find intermittant corruption of the ext2 filesystem.
always four bytes; and apparently commonly (maybe always?) the four
before an exact multiple of 4096 bytes - the filesystem block size.
The values that are written instead of the correct data do not appear to
be random; but rather data from the (memory) cache; I've noticed one or
two previously that look "familiar"; but that may just be my tired eyes.

That's all I can think (ramble?) of at this time; Awaiting bright ideas;
or more free time to fiddle more. Thanks in Advance (for all the bright
ideas :) )


Gerard Sharp 
Two Penguins at 1024x768
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to