Re: the 1Tb block issue
On Tue, May 18, 2010 at 08:38:22PM +0300, Avi Kivity wrote: Yes. Why would Linux post overlapping requests? makes 0x sense. There may be a guest bug in here too. Christoph? Overlapping writes are entirely fine from the guest POV, although they should be rather unusual. We can update a page and send it out again when it gets redirtied while still out on the wire. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
On 05/19/2010 11:57 AM, Christoph Hellwig wrote: On Tue, May 18, 2010 at 08:38:22PM +0300, Avi Kivity wrote: Yes. Why would Linux post overlapping requests? makes 0x sense. There may be a guest bug in here too. Christoph? Overlapping writes are entirely fine from the guest POV, although they should be rather unusual. We can update a page and send it out again when it gets redirtied while still out on the wire. But the device may reorder requests: system device issue request r1 for sector n page p dma into buffer b1 modify contents of page p issue request r2 for sector n page p dma into buffer b2 complete r2 complete r1 Is there any guarantee r2 will complete after r1, or that b1 and b2 are coherent? I'm not aware of any. What about NFS O_DIRECT backing virtio-blk? Here, requests can definitely be reordered, and the buffers are certainly not coherent (since they're don't even exist once the data has left the NIC). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
the 1Tb block issue
I just re-verified it on current stable qemu-kvm-0.12.4. The issue is still here, trivial to trigger. kvm-img create test.raw 1500G kvm ... \ -drive file=test.raw,if=virtio it fails right on the mkfs stage: mkfs.ext4 /dev/vdb Writing inode tables: end_request: I/O error, dev vdb, sector 3145727872 Buffer I/O error on device vdb, logical block 393215984 lost page write due to I/O error on vdb Buffer I/O error on device vdb, logical block 393215985 ... Buffer I/O error on device vdb, logical block 393215993 After that it continues the mkfs process, but I doubt it will produce a good filesystem. So far, only virtio has this problem. I tested with if=ide, it's slower but it went much further without any error. It's still running, but at this rate it will run for some hours more ;) At least it does not spew errors like the virtio case. Unfortunately I don't have enough free space to test. Yes the file is sparse, but it grows quite fast when mkfs is running, and I'm not sure the ~100Gb free space on the largest filesystem I have will be enough for it... but let's see. /mjt /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
On Tue, May 18, 2010 at 07:52:45PM +0400, Michael Tokarev wrote: I just re-verified it on current stable qemu-kvm-0.12.4. The issue is still here, trivial to trigger. kvm-img create test.raw 1500G kvm ... \ -drive file=test.raw,if=virtio it fails right on the mkfs stage: mkfs.ext4 /dev/vdb Writing inode tables: end_request: I/O error, dev vdb, sector 3145727872 Buffer I/O error on device vdb, logical block 393215984 lost page write due to I/O error on vdb Buffer I/O error on device vdb, logical block 393215985 ... Buffer I/O error on device vdb, logical block 393215993 After that it continues the mkfs process, but I doubt it will produce a good filesystem. So far, only virtio has this problem. I tested with if=ide, it's slower but it went much further without any error. It's still running, but at this rate it will run for some hours more ;) At least it does not spew errors like the virtio case. FYI this is a really useful tool for validating correctness of the block layer. http://people.redhat.com/sct/src/verify-data/ Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
18.05.2010 19:52, Michael Tokarev wrote: I just re-verified it on current stable qemu-kvm-0.12.4. The issue is still here, trivial to trigger. kvm-img create test.raw 1500G kvm ... \ -drive file=test.raw,if=virtio it fails right on the mkfs stage: mkfs.ext4 /dev/vdb Writing inode tables: end_request: I/O error, dev vdb, sector 3145727872 Buffer I/O error on device vdb, logical block 393215984 lost page write due to I/O error on vdb Buffer I/O error on device vdb, logical block 393215985 ... Buffer I/O error on device vdb, logical block 393215993 After that it continues the mkfs process, but I doubt it will produce a good filesystem. A few more data point, for what it's worth. I tried running it under strace, but in that case the issue does not occur: mkfs wents on without errors. That puzzles me: timing problem? It always fails at the same place: sector 3145727872. This is - apparently - somewhere at the end of my 1500Gb file. If I hit Ctrl+C to stop it, mkfs will sit there forever, waiting for sync_file_pages. I tried both 32 and 64bit host with 64bit guest. The effect is exactly the same. So far, only virtio has this problem. I tested with if=ide, it's slower but it went much further without any error. It's still running, but at this rate it will run for some hours more ;) At least it does not spew errors like the virtio case. That seems to work, the filesystem looks healthy. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
On Tue, May 18, 2010 at 08:51:55PM +0400, Michael Tokarev wrote: 18.05.2010 19:52, Michael Tokarev wrote: I just re-verified it on current stable qemu-kvm-0.12.4. The issue is still here, trivial to trigger. kvm-img create test.raw 1500G kvm ... \ -drive file=test.raw,if=virtio it fails right on the mkfs stage: mkfs.ext4 /dev/vdb Writing inode tables: end_request: I/O error, dev vdb, sector 3145727872 Buffer I/O error on device vdb, logical block 393215984 lost page write due to I/O error on vdb Buffer I/O error on device vdb, logical block 393215985 ... Buffer I/O error on device vdb, logical block 393215993 After that it continues the mkfs process, but I doubt it will produce a good filesystem. A few more data point, for what it's worth. I tried running it under strace, but in that case the issue does not occur: mkfs wents on without errors. That puzzles me: timing problem? It always fails at the same place: sector 3145727872. This is - apparently - somewhere at the end of my 1500Gb file. Hmmm. 3145727872*512 = 0x -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
On 05/18/2010 06:52 PM, Michael Tokarev wrote: I just re-verified it on current stable qemu-kvm-0.12.4. The issue is still here, trivial to trigger. Can you try the patch I just posted? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
On 05/18/2010 08:34 PM, Michael Tokarev wrote: 18.05.2010 21:29, Avi Kivity wrote: On 05/18/2010 06:52 PM, Michael Tokarev wrote: I just re-verified it on current stable qemu-kvm-0.12.4. The issue is still here, trivial to trigger. Can you try the patch I just posted? Applied this one: [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping requests quick tests shows it works correctly so far. At least it went further than before, not stopping at the usual sector 3145727872. Hmm. Ide has no queue, hence no mergeing, that's why it does not occur with ide, right? :) Yes. Interesting... Yes. Why would Linux post overlapping requests? makes 0x sense. There may be a guest bug in here too. Christoph? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
18.05.2010 21:38, Avi Kivity wrote: [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping requests quick tests shows it works correctly so far. At least it went further than before, not stopping at the usual sector 3145727872. Hmm. Ide has no queue, hence no mergeing, that's why it does not occur with ide, right? :) Yes. I tried multiple times to reproduce it with if=scsi (queue_depth=16 for the sym53c8xx driver). I can't. JFYI.. ;) (And this kinda explains why the bug does not occur when run under strace; which also indicates that it isn't necessary easy to trigger it, too). Interesting... Yes. Why would Linux post overlapping requests? makes 0x sense. It's mkfs. Not sure why, but yes, maybe it's a guest bug after all. Note that I'm running 64bit kernel on the guest (2.6.32.9-amd64). Note also that it's not as on the original bugreport - there, the sector# is apparently different: http://sourceforge.net/tracker/?func=detailaid=2933400group_id=180599atid=893831 There may be a guest bug in here too. Christoph? /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
On 05/18/2010 09:03 PM, Michael Tokarev wrote: 18.05.2010 21:38, Avi Kivity wrote: [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping requests quick tests shows it works correctly so far. At least it went further than before, not stopping at the usual sector 3145727872. Hmm. Ide has no queue, hence no mergeing, that's why it does not occur with ide, right? :) Yes. I tried multiple times to reproduce it with if=scsi (queue_depth=16 for the sym53c8xx driver). I can't. JFYI.. ;) Merging needs explicit support in the block device emulation, which scsi lacks. Interesting... Yes. Why would Linux post overlapping requests? makes 0x sense. It's mkfs. mkfs simply writes to the block device, even if it does issue overlapping writes, Linux shouldn't. Either the writes contain the same content in the overlapping section, in which case it's redundant, or they don't, and we have data corruption in the making. Not sure why, but yes, maybe it's a guest bug after all. It's a host bug for sure, with a potential for a guest bug. Note that I'm running 64bit kernel on the guest (2.6.32.9-amd64). Note also that it's not as on the original bugreport - there, the sector# is apparently different: http://sourceforge.net/tracker/?func=detailaid=2933400group_id=180599atid=893831 I don't think it's related to a particular sector number. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
18.05.2010 22:09, Avi Kivity wrote: On 05/18/2010 09:03 PM, Michael Tokarev wrote: 18.05.2010 21:38, Avi Kivity wrote: [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping requests quick tests shows it works correctly so far. At least it went further than before, not stopping at the usual sector 3145727872. Hmm. Ide has no queue, hence no mergeing, that's why it does not occur with ide, right? :) [] Yes. Why would Linux post overlapping requests? makes 0x sense. [] Note also that it's not as on the original bugreport - there, the sector# is apparently different: http://sourceforge.net/tracker/?func=detailaid=2933400group_id=180599atid=893831 I don't think it's related to a particular sector number. I added a debug printf into the place that is touched by the patch mentioned above, to print the case where the request were merged before that patch but not any more with it applied: if (reqs[i].sector == oldreq_last) { merge = 1; } else if (reqs[i].sector oldreq_last) fprintf(stderr, NOT mergeing:\n reqs[i].sector=%Ld oldreq_last=%Ld\n reqs[outidx].sector=%Ld reqs[outidx].nb_sectors=%d\n , reqs[i].sector, oldreq_last, reqs[outidx].sector, reqs[outidx].nb_sectors); In a few runs it showed different info (and I modified the printf line 2 times too): NOT mergeing: reqs[i].sector=92306456 oldreq_last=3145728000 NOT mergeing: reqs[i].sector=92322056 oldreq_last=3145728000 reqs[outidx].sector=3145727872 NOT mergeing: reqs[i].sector=0 oldreq_last=3145728000 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128 NOT mergeing: reqs[i].sector=0 oldreq_last=3145728000 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128 NOT mergeing: reqs[i].sector=92308152 oldreq_last=3145728000 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128 NOT mergeing: reqs[i].sector=0 oldreq_last=3145728000 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128 So it's definitely timing-related somehow (esp. it changes when interrupting mkfs and immediately starting again), and shows different values, but for me it's - apparently - always reqs[outidx].sector=3145727872 together with some other sector. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the 1Tb block issue
18.05.2010 22:38, Michael Tokarev пишет: 18.05.2010 22:09, Avi Kivity wrote: On 05/18/2010 09:03 PM, Michael Tokarev wrote: 18.05.2010 21:38, Avi Kivity wrote: [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping requests quick tests shows it works correctly so far. At least it went further than before, not stopping at the usual sector 3145727872. Hmm. Ide has no queue, hence no mergeing, that's why it does not occur with ide, right? :) [] Yes. Why would Linux post overlapping requests? makes 0x sense. [] Note also that it's not as on the original bugreport - there, the sector# is apparently different: http://sourceforge.net/tracker/?func=detailaid=2933400group_id=180599atid=893831 I don't think it's related to a particular sector number. I added a debug printf into the place that is touched by the patch mentioned above, to print the case where the request were merged before that patch but not any more with it applied: if (reqs[i].sector == oldreq_last) { merge = 1; } else if (reqs[i].sector oldreq_last) fprintf(stderr, NOT mergeing:\n reqs[i].sector=%Ld oldreq_last=%Ld\n reqs[outidx].sector=%Ld reqs[outidx].nb_sectors=%d\n , reqs[i].sector, oldreq_last, reqs[outidx].sector, reqs[outidx].nb_sectors); In a few runs it showed different info (and I modified the printf line 2 times too): NOT mergeing: reqs[i].sector=92306456 oldreq_last=3145728000 NOT mergeing: reqs[i].sector=92322056 oldreq_last=3145728000 reqs[outidx].sector=3145727872 NOT mergeing: reqs[i].sector=0 oldreq_last=3145728000 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128 NOT mergeing: reqs[i].sector=0 oldreq_last=3145728000 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128 NOT mergeing: reqs[i].sector=92308152 oldreq_last=3145728000 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128 NOT mergeing: reqs[i].sector=0 oldreq_last=3145728000 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128 So it's definitely timing-related somehow (esp. it changes when interrupting mkfs and immediately starting again), and shows different values, but for me it's - apparently - always reqs[outidx].sector=3145727872 together with some other sector. And once I hit Send it showed another: NOT mergeing: reqs[i].sector=760 oldreq_last=3141599488 reqs[outidx].sector=3141597896 reqs[outidx].nb_sectors=1592 so it's not the case here ;) /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html