Re: the 1Tb block issue

2010-05-19 Thread Christoph Hellwig
On Tue, May 18, 2010 at 08:38:22PM +0300, Avi Kivity wrote:
 Yes.  Why would Linux post overlapping requests? makes  
 0x sense.

 There may be a guest bug in here too.  Christoph?

Overlapping writes are entirely fine from the guest POV, although they
should be rather unusual.  We can update a page and send it out again
when it gets redirtied while still out on the wire.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-19 Thread Avi Kivity

On 05/19/2010 11:57 AM, Christoph Hellwig wrote:

On Tue, May 18, 2010 at 08:38:22PM +0300, Avi Kivity wrote:

Yes.  Why would Linux post overlapping requests? makes
0x sense.

There may be a guest bug in here too.  Christoph?


Overlapping writes are entirely fine from the guest POV, although they
should be rather unusual.  We can update a page and send it out again
when it gets redirtied while still out on the wire.


But the device may reorder requests:

  system  device

  issue request r1 for sector n page p
  dma into buffer b1
  modify contents of page p
  issue request r2 for sector n page p
  dma into buffer b2
  complete r2
  complete r1

Is there any guarantee r2 will complete after r1, or that b1 and b2 are 
coherent?  I'm not aware of any.


What about NFS O_DIRECT backing virtio-blk?  Here, requests can 
definitely be reordered, and the buffers are certainly not coherent 
(since they're don't even exist once the data has left the NIC).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


the 1Tb block issue

2010-05-18 Thread Michael Tokarev

I just re-verified it on current stable
qemu-kvm-0.12.4.  The issue is still here,
trivial to trigger.

 kvm-img create test.raw 1500G
 kvm ... \
  -drive file=test.raw,if=virtio

it fails right on the mkfs stage:

 mkfs.ext4 /dev/vdb
 Writing inode tables: end_request: I/O error, dev vdb, sector 3145727872
 Buffer I/O error on device vdb, logical block 393215984
 lost page write due to I/O error on vdb
 Buffer I/O error on device vdb, logical block 393215985
 ...
 Buffer I/O error on device vdb, logical block 393215993

After that it continues the mkfs process, but I doubt it will
produce a good filesystem.

So far, only virtio has this problem.  I tested with if=ide, it's
slower but it went much further without any error.  It's still
running, but at this rate it will run for some hours more ;)
At least it does not spew errors like the virtio case.

Unfortunately I don't have enough free space to test.  Yes the file
is sparse, but it grows quite fast when mkfs is running, and I'm
not sure the ~100Gb free space on the largest filesystem I have
will be enough for it...  but let's see.

/mjt

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Daniel P. Berrange
On Tue, May 18, 2010 at 07:52:45PM +0400, Michael Tokarev wrote:
 I just re-verified it on current stable
 qemu-kvm-0.12.4.  The issue is still here,
 trivial to trigger.
 
  kvm-img create test.raw 1500G
  kvm ... \
   -drive file=test.raw,if=virtio
 
 it fails right on the mkfs stage:
 
  mkfs.ext4 /dev/vdb
  Writing inode tables: end_request: I/O error, dev vdb, sector 3145727872
  Buffer I/O error on device vdb, logical block 393215984
  lost page write due to I/O error on vdb
  Buffer I/O error on device vdb, logical block 393215985
  ...
  Buffer I/O error on device vdb, logical block 393215993
 
 After that it continues the mkfs process, but I doubt it will
 produce a good filesystem.
 
 So far, only virtio has this problem.  I tested with if=ide, it's
 slower but it went much further without any error.  It's still
 running, but at this rate it will run for some hours more ;)
 At least it does not spew errors like the virtio case.

FYI this is a really useful tool for validating correctness
of the block layer.

  http://people.redhat.com/sct/src/verify-data/


Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Michael Tokarev

18.05.2010 19:52, Michael Tokarev wrote:

I just re-verified it on current stable
qemu-kvm-0.12.4. The issue is still here,
trivial to trigger.

kvm-img create test.raw 1500G
kvm ... \
-drive file=test.raw,if=virtio

it fails right on the mkfs stage:

mkfs.ext4 /dev/vdb
Writing inode tables: end_request: I/O error, dev vdb, sector 3145727872
Buffer I/O error on device vdb, logical block 393215984
lost page write due to I/O error on vdb
Buffer I/O error on device vdb, logical block 393215985
...
Buffer I/O error on device vdb, logical block 393215993

After that it continues the mkfs process, but I doubt it will
produce a good filesystem.


A few more data point, for what it's worth.

I tried running it under strace, but in that case the issue does
not occur: mkfs wents on without errors.  That puzzles me: timing
problem?

It always fails at the same place: sector 3145727872. This
is - apparently - somewhere at the end of my 1500Gb file.

If I hit Ctrl+C to stop it, mkfs will sit there forever,
waiting for sync_file_pages.

I tried both 32 and 64bit host with 64bit guest.
The effect is exactly the same.


So far, only virtio has this problem. I tested with if=ide, it's
slower but it went much further without any error. It's still
running, but at this rate it will run for some hours more ;)
At least it does not spew errors like the virtio case.


That seems to work, the filesystem looks healthy.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Gleb Natapov
On Tue, May 18, 2010 at 08:51:55PM +0400, Michael Tokarev wrote:
 18.05.2010 19:52, Michael Tokarev wrote:
 I just re-verified it on current stable
 qemu-kvm-0.12.4. The issue is still here,
 trivial to trigger.
 
 kvm-img create test.raw 1500G
 kvm ... \
 -drive file=test.raw,if=virtio
 
 it fails right on the mkfs stage:
 
 mkfs.ext4 /dev/vdb
 Writing inode tables: end_request: I/O error, dev vdb, sector 3145727872
 Buffer I/O error on device vdb, logical block 393215984
 lost page write due to I/O error on vdb
 Buffer I/O error on device vdb, logical block 393215985
 ...
 Buffer I/O error on device vdb, logical block 393215993
 
 After that it continues the mkfs process, but I doubt it will
 produce a good filesystem.
 
 A few more data point, for what it's worth.
 
 I tried running it under strace, but in that case the issue does
 not occur: mkfs wents on without errors.  That puzzles me: timing
 problem?
 
 It always fails at the same place: sector 3145727872. This
 is - apparently - somewhere at the end of my 1500Gb file.
 
Hmmm. 3145727872*512 = 0x

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Avi Kivity

On 05/18/2010 06:52 PM, Michael Tokarev wrote:

I just re-verified it on current stable
qemu-kvm-0.12.4.  The issue is still here,
trivial to trigger.



Can you try the patch I just posted?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Avi Kivity

On 05/18/2010 08:34 PM, Michael Tokarev wrote:

18.05.2010 21:29, Avi Kivity wrote:

On 05/18/2010 06:52 PM, Michael Tokarev wrote:

I just re-verified it on current stable
qemu-kvm-0.12.4. The issue is still here,
trivial to trigger.


Can you try the patch I just posted?


Applied this one:

 [Qemu-devel] [PATCH +stable] block: don't attempt to merge 
overlapping requests


quick tests shows it works correctly so far.
At least it went further than before, not
stopping at the usual sector 3145727872.

Hmm.  Ide has no queue, hence no mergeing,
that's why it does not occur with ide,
right? :)



Yes.


Interesting...


Yes.  Why would Linux post overlapping requests? makes 
0x sense.


There may be a guest bug in here too.  Christoph?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Michael Tokarev

18.05.2010 21:38, Avi Kivity wrote:


[Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping
requests

quick tests shows it works correctly so far.
At least it went further than before, not
stopping at the usual sector 3145727872.

Hmm. Ide has no queue, hence no mergeing,
that's why it does not occur with ide,
right? :)


Yes.


I tried multiple times to reproduce it with if=scsi
(queue_depth=16 for the sym53c8xx driver).  I can't.
JFYI.. ;)

(And this kinda explains why the bug does not occur
when run under strace; which also indicates that it
isn't necessary easy to trigger it, too).


Interesting...


Yes. Why would Linux post overlapping requests? makes 0x
sense.


It's mkfs.  Not sure why, but yes, maybe it's a guest
bug after all.  Note that I'm running 64bit kernel on
the guest (2.6.32.9-amd64).

Note also that it's not as on the original bugreport -
there, the sector# is apparently different:
http://sourceforge.net/tracker/?func=detailaid=2933400group_id=180599atid=893831


There may be a guest bug in here too. Christoph?


/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Avi Kivity

On 05/18/2010 09:03 PM, Michael Tokarev wrote:

18.05.2010 21:38, Avi Kivity wrote:


[Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping
requests

quick tests shows it works correctly so far.
At least it went further than before, not
stopping at the usual sector 3145727872.

Hmm. Ide has no queue, hence no mergeing,
that's why it does not occur with ide,
right? :)


Yes.


I tried multiple times to reproduce it with if=scsi
(queue_depth=16 for the sym53c8xx driver).  I can't.
JFYI.. ;)


Merging needs explicit support in the block device emulation, which scsi 
lacks.





Interesting...


Yes. Why would Linux post overlapping requests? makes 0x
sense.


It's mkfs.


mkfs simply writes to the block device, even if it does issue 
overlapping writes, Linux shouldn't.  Either the writes contain the same 
content in the overlapping section, in which case it's redundant, or 
they don't, and we have data corruption in the making.



Not sure why, but yes, maybe it's a guest
bug after all. 


It's a host bug for sure, with a potential for a guest bug.


Note that I'm running 64bit kernel on
the guest (2.6.32.9-amd64).

Note also that it's not as on the original bugreport -
there, the sector# is apparently different:
http://sourceforge.net/tracker/?func=detailaid=2933400group_id=180599atid=893831


I don't think it's related to a particular sector number.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Michael Tokarev

18.05.2010 22:09, Avi Kivity wrote:

On 05/18/2010 09:03 PM, Michael Tokarev wrote:

18.05.2010 21:38, Avi Kivity wrote:


[Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping
requests

quick tests shows it works correctly so far.
At least it went further than before, not
stopping at the usual sector 3145727872.

Hmm. Ide has no queue, hence no mergeing,
that's why it does not occur with ide,
right? :)



[]

Yes. Why would Linux post overlapping requests? makes 0x
sense.

[]

Note also that it's not as on the original bugreport -
there, the sector# is apparently different:
http://sourceforge.net/tracker/?func=detailaid=2933400group_id=180599atid=893831


I don't think it's related to a particular sector number.


I added a debug printf into the place that is touched
by the patch mentioned above, to print the case where
the request were merged before that patch but not any
more with it applied:

if (reqs[i].sector == oldreq_last) {
merge = 1;
}
else if (reqs[i].sector  oldreq_last)
 fprintf(stderr, NOT mergeing:\n
 reqs[i].sector=%Ld oldreq_last=%Ld\n
 reqs[outidx].sector=%Ld reqs[outidx].nb_sectors=%d\n
, reqs[i].sector, oldreq_last,
reqs[outidx].sector, reqs[outidx].nb_sectors);

In a few runs it showed different info (and I modified the
printf line 2 times too):

NOT mergeing: reqs[i].sector=92306456 oldreq_last=3145728000
NOT mergeing:
 reqs[i].sector=92322056 oldreq_last=3145728000
 reqs[outidx].sector=3145727872
NOT mergeing:
 reqs[i].sector=0 oldreq_last=3145728000
 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128
NOT mergeing:
 reqs[i].sector=0 oldreq_last=3145728000
 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128
NOT mergeing:
 reqs[i].sector=92308152 oldreq_last=3145728000
 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128
NOT mergeing:
 reqs[i].sector=0 oldreq_last=3145728000
 reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128

So it's definitely timing-related somehow (esp. it changes
when interrupting mkfs and immediately starting again), and
shows different values, but for me it's - apparently - always
reqs[outidx].sector=3145727872 together with some other sector.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: the 1Tb block issue

2010-05-18 Thread Michael Tokarev

18.05.2010 22:38, Michael Tokarev пишет:

18.05.2010 22:09, Avi Kivity wrote:

On 05/18/2010 09:03 PM, Michael Tokarev wrote:

18.05.2010 21:38, Avi Kivity wrote:


[Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping
requests

quick tests shows it works correctly so far.
At least it went further than before, not
stopping at the usual sector 3145727872.

Hmm. Ide has no queue, hence no mergeing,
that's why it does not occur with ide,
right? :)



[]

Yes. Why would Linux post overlapping requests? makes
0x
sense.

[]

Note also that it's not as on the original bugreport -
there, the sector# is apparently different:
http://sourceforge.net/tracker/?func=detailaid=2933400group_id=180599atid=893831



I don't think it's related to a particular sector number.


I added a debug printf into the place that is touched
by the patch mentioned above, to print the case where
the request were merged before that patch but not any
more with it applied:

if (reqs[i].sector == oldreq_last) {
merge = 1;
}
else if (reqs[i].sector  oldreq_last)
fprintf(stderr, NOT mergeing:\n
 reqs[i].sector=%Ld oldreq_last=%Ld\n
 reqs[outidx].sector=%Ld reqs[outidx].nb_sectors=%d\n
, reqs[i].sector, oldreq_last,
reqs[outidx].sector, reqs[outidx].nb_sectors);

In a few runs it showed different info (and I modified the
printf line 2 times too):

NOT mergeing: reqs[i].sector=92306456 oldreq_last=3145728000
NOT mergeing:
reqs[i].sector=92322056 oldreq_last=3145728000
reqs[outidx].sector=3145727872
NOT mergeing:
reqs[i].sector=0 oldreq_last=3145728000
reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128
NOT mergeing:
reqs[i].sector=0 oldreq_last=3145728000
reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128
NOT mergeing:
reqs[i].sector=92308152 oldreq_last=3145728000
reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128
NOT mergeing:
reqs[i].sector=0 oldreq_last=3145728000
reqs[outidx].sector=3145727872 reqs[outidx].nb_sectors=128



So it's definitely timing-related somehow (esp. it changes
when interrupting mkfs and immediately starting again), and
shows different values, but for me it's - apparently - always
reqs[outidx].sector=3145727872 together with some other sector.


And once I hit Send it showed another:

NOT mergeing:
 reqs[i].sector=760 oldreq_last=3141599488
 reqs[outidx].sector=3141597896 reqs[outidx].nb_sectors=1592

so it's not the case here ;)


/mjt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html