I believe I've come across a bug in the disk read error recovery logic
for raid1 check/repair operations in 2.6.20.  The raid1.c file looks
identical in 2.6.21 so the problem should still exist there as well.

This all surfaced when using a variant of CONFIG_FAIL_MAKE_REQUEST to
inject read errors on one of the mirrors of a raid1 array.  I noticed
that while this would ultimately fail the array, it would always seem
to generate

ata1.00: WARNING: zero len r/w req 
ata1.00: WARNING: zero len r/w req 
ata1.00: WARNING: zero len r/w req 
ata1.00: WARNING: zero len r/w req 
ata1.00: WARNING: zero len r/w req 
ata1.00: WARNING: zero len r/w req 

diagnostics at the same time (no clue why there are six of them).
Delving into this further I eventually settled on sync_request_write()
in raid1.c as a likely culprit and added the WARN_ON (below)

@@ -1386,6 +1393,7 @@
                atomic_inc(&r1_bio->remaining);
                md_sync_acct(conf->mirrors[i].rdev->bdev, wbio->bi_size >> 9);
 
+               WARN_ON(wbio->bi_size == 0);
                generic_make_request(wbio);
        }


to confirm that this code was indeed sending a zero size bio down to
the device layer in this circumstance.

Looking at the preceding code in sync_request_write() it appears that
the loop comparing the results of all reads just skips a mirror where
the read failed (BIO_UPTODATE is clear) without doing any of the sbio
prep or the memcpy() from the pbio.  There is other read/re-write logic
in the following if-clause but this seems to apply only if none of the
mirrors were readable.  Regardless, the fact that a zero length bio is
being issued in the "schedule writes" section is compelling evidence
that something is wrong somewhere.

I tried the following patch to raid1.c which short-ciruits the data
comparison in the read error case but otherwise does the rest of the
sbio prep for the mirror with the error.  It seems to have eliminated
the ATA warning at least.  Is it a correct thing to do?

@@ -1235,17 +1242,20 @@
                        }
                r1_bio->read_disk = primary;
                for (i=0; i<mddev->raid_disks; i++)
-                       if (r1_bio->bios[i]->bi_end_io == end_sync_read &&
-                           test_bit(BIO_UPTODATE, &r1_bio->bios[i]->bi_flags)) 
{
+                       if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
                                int j;
                                int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
                                struct bio *pbio = r1_bio->bios[primary];
                                struct bio *sbio = r1_bio->bios[i];
-                               for (j = vcnt; j-- ; )
-                                       if 
(memcmp(page_address(pbio->bi_io_vec[j].bv_page),
-                                                  
page_address(sbio->bi_io_vec[j].bv_page),
-                                                  PAGE_SIZE))
-                                               break;
+                               if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
+                                       for (j = vcnt; j-- ; )
+                                               if 
(memcmp(page_address(pbio->bi_io_vec[j].bv_page),
+                                                          
page_address(sbio->bi_io_vec[j].bv_page),
+                                                          PAGE_SIZE))
+                                                       break;
+                               } else {
+                                       j = 0;
+                               }
                                if (j >= 0)
                                        mddev->resync_mismatches += 
r1_bio->sectors;
                                if (j < 0 || test_bit(MD_RECOVERY_CHECK, 
&mddev->recovery)) {
--
Mike Accetta

ECI Telecom Ltd.
Data Networking Division (previously Laurel Networks)
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to