Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-11 Thread Paolo Bonzini
On 11/08/2017 19:23, Michael S. Tsirkin wrote:
> On Fri, Aug 11, 2017 at 04:09:26PM +0200, Paolo Bonzini wrote:
>> On 10/08/2017 23:41, Michael S. Tsirkin wrote:
> Then we probably should fail probe if vq size is too small.
 What does this mean?
>>>
>>> We must prevent driver from submitting s/g lists > vq size to device.
>>
>> What is the rationale for the limit?
> 
> So the host knows what it needs to support.
> 
>> both virtio-blk and virtio-scsi transmit their own value for the
>> maximum sg list size (max_seg in virtio-scsi, seg_max in virtio-blk).
> 
> No other device has it, and it seemed like a good idea to
> limit it generally at the time.
> 
> we can fix the spec to relax the requirement for blk and scsi -
> want to submit a proposal? Alternatively, add a generic field
> for that.

Yes, I can submit a proposal.  blk and scsi are the ones that are most
likely to have very long sg lists.

When I was designing scsi I just copied that field from blk. :)

Paolo


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-11 Thread Michael S. Tsirkin
On Fri, Aug 11, 2017 at 04:09:26PM +0200, Paolo Bonzini wrote:
> On 10/08/2017 23:41, Michael S. Tsirkin wrote:
> >>> Then we probably should fail probe if vq size is too small.
> >> What does this mean?
> > 
> > We must prevent driver from submitting s/g lists > vq size to device.
> 
> What is the rationale for the limit?

So the host knows what it needs to support.

>  It makes no sense if indirect
> descriptors are available, especially because...
> 
> > Either tell linux to avoid s/g lists that are too long, or
> > simply fail request if this happens, or refuse to attach driver to device.
> > 
> > Later option would look something like this within probe:
> > 
> > for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++)
> > if (vqs[i]->num < MAX_SG_USED_BY_LINUX)
> > goto err;
> > 
> > 
> > I don't know what's MAX_SG_USED_BY_LINUX though.
> > 
> 
> ... both virtio-blk and virtio-scsi transmit their own value for the
> maximum sg list size (max_seg in virtio-scsi, seg_max in virtio-blk).
> 
> Paolo

No other device has it, and it seemed like a good idea to
limit it generally at the time.

we can fix the spec to relax the requirement for blk and scsi -
want to submit a proposal? Alternatively, add a generic field
for that.

For a quick fix, make sure vq size is >= max sg.

-- 
MST


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-11 Thread Paolo Bonzini
On 10/08/2017 23:41, Michael S. Tsirkin wrote:
>>> Then we probably should fail probe if vq size is too small.
>> What does this mean?
> 
> We must prevent driver from submitting s/g lists > vq size to device.

What is the rationale for the limit?  It makes no sense if indirect
descriptors are available, especially because...

> Either tell linux to avoid s/g lists that are too long, or
> simply fail request if this happens, or refuse to attach driver to device.
> 
> Later option would look something like this within probe:
> 
> for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++)
>   if (vqs[i]->num < MAX_SG_USED_BY_LINUX)
>   goto err;
> 
> 
> I don't know what's MAX_SG_USED_BY_LINUX though.
> 

... both virtio-blk and virtio-scsi transmit their own value for the
maximum sg list size (max_seg in virtio-scsi, seg_max in virtio-blk).

Paolo


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Richard W.M. Jones
On Fri, Aug 11, 2017 at 12:41:47AM +0300, Michael S. Tsirkin wrote:
> On Thu, Aug 10, 2017 at 10:35:11PM +0100, Richard W.M. Jones wrote:
> > On Fri, Aug 11, 2017 at 12:31:44AM +0300, Michael S. Tsirkin wrote:
> > > Then we probably should fail probe if vq size is too small.
> > 
> > What does this mean?
> > 
> > Rich.
> 
> We must prevent driver from submitting s/g lists > vq size to device.

Fair point.  I would have thought (not knowing anything about how this
works in the kernel) that the driver should be able to split up
scatter-gather lists if they are larger than what the driver supports?

Anyway the commit message is derived from what Paolo told me on IRC,
so let's see what he says.

Rich.

> Either tell linux to avoid s/g lists that are too long, or
> simply fail request if this happens, or refuse to attach driver to device.
> 
> Later option would look something like this within probe:
> 
> for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++)
>   if (vqs[i]->num < MAX_SG_USED_BY_LINUX)
>   goto err;
> 
> 
> I don't know what's MAX_SG_USED_BY_LINUX though.
> 
> > -- 
> > Richard Jones, Virtualization Group, Red Hat 
> > http://people.redhat.com/~rjones
> > Read my programming and virtualization blog: http://rwmj.wordpress.com
> > virt-builder quickly builds VMs from scratch
> > http://libguestfs.org/virt-builder.1.html

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Michael S. Tsirkin
On Thu, Aug 10, 2017 at 10:35:11PM +0100, Richard W.M. Jones wrote:
> On Fri, Aug 11, 2017 at 12:31:44AM +0300, Michael S. Tsirkin wrote:
> > Then we probably should fail probe if vq size is too small.
> 
> What does this mean?
> 
> Rich.

We must prevent driver from submitting s/g lists > vq size to device.


Either tell linux to avoid s/g lists that are too long, or
simply fail request if this happens, or refuse to attach driver to device.

Later option would look something like this within probe:

for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++)
if (vqs[i]->num < MAX_SG_USED_BY_LINUX)
goto err;


I don't know what's MAX_SG_USED_BY_LINUX though.

> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-builder quickly builds VMs from scratch
> http://libguestfs.org/virt-builder.1.html


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Richard W.M. Jones
On Fri, Aug 11, 2017 at 12:31:44AM +0300, Michael S. Tsirkin wrote:
> Then we probably should fail probe if vq size is too small.

What does this mean?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Michael S. Tsirkin
On Thu, Aug 10, 2017 at 10:30:38PM +0100, Richard W.M. Jones wrote:
> On Fri, Aug 11, 2017 at 12:21:16AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Aug 10, 2017 at 05:40:34PM +0100, Richard W.M. Jones wrote:
> > > If using indirect descriptors, you can make the total_sg as large as
> > > you want.
> > 
> > That would be a spec violation though, even if it happens to
> > work on current QEMU.
> > 
> > The spec says:
> > A driver MUST NOT create a descriptor chain longer than the Queue Size 
> > of the device.
> > 
> > What prompted this patch?
> > Do we ever encounter this situation?
> 
> This patch is needed because the following (2/2) patch will trigger
> that BUG_ON if I set virtqueue_size=64 or any smaller value.
> 
> The precise backtrace is below.
> 
> Rich.
> 
> [4.029510] [ cut here ]
> [4.030127] kernel BUG at drivers/virtio/virtio_ring.c:299!
> [4.030834] invalid opcode:  [#1] SMP
> [4.031340] Modules linked in: libcrc32c crc8 crc7 crc4 crc_itu_t 
> virtio_pci virtio_mmio virtio_input virtio_balloon virtio_scsi nd_pmem nd_btt 
> virtio_net virtio_crypto crypto_engine virtio_console virtio_rng virtio_blk 
> virtio_ring virtio nfit crc32_generic crct10dif_pclmul crc32c_intel 
> crc32_pclmul
> [4.034606] CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-rc4+ #100
> [4.035354] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [4.036770] task: 9a859e243300 task.stack: a731c00cc000
> [4.037506] RIP: 0010:virtqueue_add_sgs+0x23d/0x460 [virtio_ring]
> [4.038250] RSP: :a731c00cf6e0 EFLAGS: 00010097
> [4.038898] RAX:  RBX: 0011 RCX: 
> dd0680646c40
> [4.039762] RDX:  RSI: a731c00cf7b8 RDI: 
> 9a85945c4d68
> [4.040634] RBP: a731c00cf748 R08: 9a85945c4a78 R09: 
> 01080020
> [4.041508] R10: a731c00cf788 R11: 9a859b3d3120 R12: 
> a731c00cf7d0
> [4.042382] R13: a731c00cf7d0 R14: 0001 R15: 
> 9a859b3f8200
> [4.043248] FS:  () GS:9a859e60() 
> knlGS:
> [4.044232] CS:  0010 DS:  ES:  CR0: 80050033
> [4.044942] CR2: 7fcff02e931c CR3: 1d23b000 CR4: 
> 003406f0
> [4.045815] DR0:  DR1:  DR2: 
> 
> [4.046684] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [4.047559] Call Trace:
> [4.047876]  virtscsi_add_cmd+0x1c9/0x280 [virtio_scsi]
> [4.048528]  virtscsi_kick_cmd+0x38/0x90 [virtio_scsi]
> [4.049161]  virtscsi_queuecommand+0x104/0x280 [virtio_scsi]
> [4.049875]  virtscsi_queuecommand_single+0x38/0x40 [virtio_scsi]
> [4.050628]  scsi_dispatch_cmd+0xf9/0x390
> [4.051128]  scsi_queue_rq+0x5e5/0x6f0
> [4.051602]  blk_mq_dispatch_rq_list+0x1ff/0x3c0
> [4.052175]  blk_mq_sched_dispatch_requests+0x199/0x1f0
> [4.052824]  __blk_mq_run_hw_queue+0x11d/0x1b0
> [4.053382]  __blk_mq_delay_run_hw_queue+0x8d/0xa0
> [4.053972]  blk_mq_run_hw_queue+0x14/0x20
> [4.054485]  blk_mq_sched_insert_requests+0x96/0x120
> [4.055095]  blk_mq_flush_plug_list+0x19d/0x410
> [4.055661]  blk_flush_plug_list+0xf9/0x2b0
> [4.056182]  blk_finish_plug+0x2c/0x40
> [4.056655]  __do_page_cache_readahead+0x32e/0x400
> [4.057261]  filemap_fault+0x2fb/0x890
> [4.057731]  ? filemap_fault+0x2fb/0x890
> [4.058220]  ? find_held_lock+0x3c/0xb0
> [4.058714]  ext4_filemap_fault+0x34/0x50
> [4.059212]  __do_fault+0x1e/0x110
> [4.059644]  __handle_mm_fault+0x6b2/0x1080
> [4.060167]  handle_mm_fault+0x178/0x350
> [4.060662]  __do_page_fault+0x26e/0x510
> [4.061152]  trace_do_page_fault+0x9d/0x290
> [4.061677]  do_async_page_fault+0x51/0xa0
> [4.062189]  async_page_fault+0x28/0x30
> [4.062667] RIP: 0033:0x7fcff030a24f
> [4.063113] RSP: 002b:7ffefc2ad078 EFLAGS: 00010206
> [4.063760] RAX: 7fcff02e931c RBX: 7fcff050f660 RCX: 
> 7fcff02e935c
> [4.064648] RDX: 0664 RSI:  RDI: 
> 7fcff02e931c
> [4.065519] RBP: 7ffefc2ad320 R08: 7fcff02e931c R09: 
> 00027000
> [4.066392] R10: 7fcff02e9980 R11: 0206 R12: 
> 7ffefc2ad0b0
> [4.067263] R13: 7ffefc2ad408 R14: 0002 R15: 
> 87f0
> [4.068135] Code: af 01 c7 45 c8 01 00 00 00 45 31 ed e9 9b fe ff ff 31 db 
> 48 83 7d d0 00 0f 85 3f fe ff ff 0f 0b 48 8b 7d b8 e8 e5 fd 22 de eb 8b <0f> 
> 0b 0f 0b 44 89 6d a8 45 89 f5 48 8b 45 a0 44 8d 70 01 48 83 
> [4.070506] RIP: virtqueue_add_sgs+0x23d/0x460 [virtio_ring] RSP: 
> a731c00cf6e0
> [4.071434] ---[ end trace 02532659840e2a64 ]---

Then we probably should fail probe if vq size is too small.

> 
> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> 

Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Richard W.M. Jones
On Fri, Aug 11, 2017 at 12:21:16AM +0300, Michael S. Tsirkin wrote:
> On Thu, Aug 10, 2017 at 05:40:34PM +0100, Richard W.M. Jones wrote:
> > If using indirect descriptors, you can make the total_sg as large as
> > you want.
> 
> That would be a spec violation though, even if it happens to
> work on current QEMU.
> 
> The spec says:
>   A driver MUST NOT create a descriptor chain longer than the Queue Size 
> of the device.
> 
> What prompted this patch?
> Do we ever encounter this situation?

This patch is needed because the following (2/2) patch will trigger
that BUG_ON if I set virtqueue_size=64 or any smaller value.

The precise backtrace is below.

Rich.

[4.029510] [ cut here ]
[4.030127] kernel BUG at drivers/virtio/virtio_ring.c:299!
[4.030834] invalid opcode:  [#1] SMP
[4.031340] Modules linked in: libcrc32c crc8 crc7 crc4 crc_itu_t virtio_pci 
virtio_mmio virtio_input virtio_balloon virtio_scsi nd_pmem nd_btt virtio_net 
virtio_crypto crypto_engine virtio_console virtio_rng virtio_blk virtio_ring 
virtio nfit crc32_generic crct10dif_pclmul crc32c_intel crc32_pclmul
[4.034606] CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-rc4+ #100
[4.035354] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[4.036770] task: 9a859e243300 task.stack: a731c00cc000
[4.037506] RIP: 0010:virtqueue_add_sgs+0x23d/0x460 [virtio_ring]
[4.038250] RSP: :a731c00cf6e0 EFLAGS: 00010097
[4.038898] RAX:  RBX: 0011 RCX: dd0680646c40
[4.039762] RDX:  RSI: a731c00cf7b8 RDI: 9a85945c4d68
[4.040634] RBP: a731c00cf748 R08: 9a85945c4a78 R09: 01080020
[4.041508] R10: a731c00cf788 R11: 9a859b3d3120 R12: a731c00cf7d0
[4.042382] R13: a731c00cf7d0 R14: 0001 R15: 9a859b3f8200
[4.043248] FS:  () GS:9a859e60() 
knlGS:
[4.044232] CS:  0010 DS:  ES:  CR0: 80050033
[4.044942] CR2: 7fcff02e931c CR3: 1d23b000 CR4: 003406f0
[4.045815] DR0:  DR1:  DR2: 
[4.046684] DR3:  DR6: fffe0ff0 DR7: 0400
[4.047559] Call Trace:
[4.047876]  virtscsi_add_cmd+0x1c9/0x280 [virtio_scsi]
[4.048528]  virtscsi_kick_cmd+0x38/0x90 [virtio_scsi]
[4.049161]  virtscsi_queuecommand+0x104/0x280 [virtio_scsi]
[4.049875]  virtscsi_queuecommand_single+0x38/0x40 [virtio_scsi]
[4.050628]  scsi_dispatch_cmd+0xf9/0x390
[4.051128]  scsi_queue_rq+0x5e5/0x6f0
[4.051602]  blk_mq_dispatch_rq_list+0x1ff/0x3c0
[4.052175]  blk_mq_sched_dispatch_requests+0x199/0x1f0
[4.052824]  __blk_mq_run_hw_queue+0x11d/0x1b0
[4.053382]  __blk_mq_delay_run_hw_queue+0x8d/0xa0
[4.053972]  blk_mq_run_hw_queue+0x14/0x20
[4.054485]  blk_mq_sched_insert_requests+0x96/0x120
[4.055095]  blk_mq_flush_plug_list+0x19d/0x410
[4.055661]  blk_flush_plug_list+0xf9/0x2b0
[4.056182]  blk_finish_plug+0x2c/0x40
[4.056655]  __do_page_cache_readahead+0x32e/0x400
[4.057261]  filemap_fault+0x2fb/0x890
[4.057731]  ? filemap_fault+0x2fb/0x890
[4.058220]  ? find_held_lock+0x3c/0xb0
[4.058714]  ext4_filemap_fault+0x34/0x50
[4.059212]  __do_fault+0x1e/0x110
[4.059644]  __handle_mm_fault+0x6b2/0x1080
[4.060167]  handle_mm_fault+0x178/0x350
[4.060662]  __do_page_fault+0x26e/0x510
[4.061152]  trace_do_page_fault+0x9d/0x290
[4.061677]  do_async_page_fault+0x51/0xa0
[4.062189]  async_page_fault+0x28/0x30
[4.062667] RIP: 0033:0x7fcff030a24f
[4.063113] RSP: 002b:7ffefc2ad078 EFLAGS: 00010206
[4.063760] RAX: 7fcff02e931c RBX: 7fcff050f660 RCX: 7fcff02e935c
[4.064648] RDX: 0664 RSI:  RDI: 7fcff02e931c
[4.065519] RBP: 7ffefc2ad320 R08: 7fcff02e931c R09: 00027000
[4.066392] R10: 7fcff02e9980 R11: 0206 R12: 7ffefc2ad0b0
[4.067263] R13: 7ffefc2ad408 R14: 0002 R15: 87f0
[4.068135] Code: af 01 c7 45 c8 01 00 00 00 45 31 ed e9 9b fe ff ff 31 db 
48 83 7d d0 00 0f 85 3f fe ff ff 0f 0b 48 8b 7d b8 e8 e5 fd 22 de eb 8b <0f> 0b 
0f 0b 44 89 6d a8 45 89 f5 48 8b 45 a0 44 8d 70 01 48 83 
[4.070506] RIP: virtqueue_add_sgs+0x23d/0x460 [virtio_ring] RSP: 
a731c00cf6e0
[4.071434] ---[ end trace 02532659840e2a64 ]---


-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Michael S. Tsirkin
On Thu, Aug 10, 2017 at 05:40:34PM +0100, Richard W.M. Jones wrote:
> If using indirect descriptors, you can make the total_sg as large as
> you want.

That would be a spec violation though, even if it happens to
work on current QEMU.

The spec says:
A driver MUST NOT create a descriptor chain longer than the Queue Size 
of the device.

What prompted this patch?  Do we ever encounter this situation?

>  If not, BUG is too serious because the function later
> returns -ENOSPC.
> 
> Thanks Paolo Bonzini, Christoph Hellwig.
> 
> Signed-off-by: Richard W.M. Jones 
> ---
>  drivers/virtio/virtio_ring.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 5e1b548828e6..27cbc1eab868 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -296,7 +296,6 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   }
>  #endif
>  
> - BUG_ON(total_sg > vq->vring.num);
>   BUG_ON(total_sg == 0);
>  
>   head = vq->free_head;
> @@ -305,8 +304,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>* buffers, then go indirect. FIXME: tune this threshold */
>   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
>   desc = alloc_indirect(_vq, total_sg, gfp);
> - else
> + else {
>   desc = NULL;
> + WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> + }
>  
>   if (desc) {
>   /* Use a single buffer which doesn't continue */
> -- 
> 2.13.1


Re: [PATCH 1/2] virtio: Reduce BUG if total_sg > virtqueue size to WARN.

2017-08-10 Thread Paolo Bonzini
On 10/08/2017 18:40, Richard W.M. Jones wrote:
> If using indirect descriptors, you can make the total_sg as large as
> you want.  If not, BUG is too serious because the function later
> returns -ENOSPC.
> 
> Thanks Paolo Bonzini, Christoph Hellwig.
> 
> Signed-off-by: Richard W.M. Jones 
> ---
>  drivers/virtio/virtio_ring.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 5e1b548828e6..27cbc1eab868 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -296,7 +296,6 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   }
>  #endif
>  
> - BUG_ON(total_sg > vq->vring.num);
>   BUG_ON(total_sg == 0);
>  
>   head = vq->free_head;
> @@ -305,8 +304,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>* buffers, then go indirect. FIXME: tune this threshold */
>   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
>   desc = alloc_indirect(_vq, total_sg, gfp);
> - else
> + else {
>   desc = NULL;
> + WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);

So we get here only if vq->vq.num_free is zero.  In that case,
vq->indirect makes no difference for the appropriateness of the warning
(that is, it should never be issued for indirect descriptors).

> + }
>  
>   if (desc) {
>   /* Use a single buffer which doesn't continue */
> 


Reviewed-by: Paolo Bonzini 

Paolo