Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-17 Thread Rusty Russell
On Wednesday 12 December 2007 12:40:43 Anthony Liguori wrote:
 Rusty Russell wrote:
  On Sunday 09 December 2007 09:02:48 Anthony Liguori wrote:
  If QEMU ever got true SMP support, then virtio would not work as it
  requires 16-bit atomic writes which AFAIK is not possible on a number of
  non-x86 architectures.
 
  Hmm?  Where is this requirement coming from?
 
  I think everyone should stop using the word atomic in virtio
  discussions; it's confusing.

 The virtio ring queue indices are 16-bit and are readable to one end
 while writable on the other end.  To ensure that this can be done in a
 lock-less way, it's necessary to atomically update the index.  Atomic is
 the right word here because if the 16-bit write gets converted into two
 8-bit writes, then very bad things could happen with SMP.

Of course, but that's insane.  Your assertion that it's not possible on a 
number of non-x86 architectures is what I'm questioning here.  You're 
confusing the inability of architectures to atomically *modify* a 16 bit 
value and our requirement, where even if you found an architecture which 
couldn't do 16 bit writes, you can do it as a 32 bit write.

Hope that clarifies,
Rusty.




Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-11 Thread Rusty Russell
On Sunday 09 December 2007 09:02:48 Anthony Liguori wrote:
 If QEMU ever got true SMP support, then virtio would not work as it
 requires 16-bit atomic writes which AFAIK is not possible on a number of
 non-x86 architectures.

Hmm?  Where is this requirement coming from?

I think everyone should stop using the word atomic in virtio discussions; 
it's confusing.

Rusty.




Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-11 Thread Anthony Liguori

Rusty Russell wrote:

On Sunday 09 December 2007 09:02:48 Anthony Liguori wrote:
  

If QEMU ever got true SMP support, then virtio would not work as it
requires 16-bit atomic writes which AFAIK is not possible on a number of
non-x86 architectures.



Hmm?  Where is this requirement coming from?

I think everyone should stop using the word atomic in virtio discussions; 
it's confusing.
  


The virtio ring queue indices are 16-bit and are readable to one end 
while writable on the other end.  To ensure that this can be done in a 
lock-less way, it's necessary to atomically update the index.  Atomic is 
the right word here because if the 16-bit write gets converted into two 
8-bit writes, then very bad things could happen with SMP.


Regards,

Anthony Liguori


Rusty.

  






Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-08 Thread Paul Brook
 virtio makes things a bit trickier though.  There's a shared ring queue
 between the host and guest.  The ring queue is lock-less and depends on
 the ability to atomically increment ring queue indices to be SMP safe.
 Using a copy-API wouldn't be a problem for QEMU since the host and guest
 are always running in lock-step.  A copy API is actually needed to deal
 with differing host/guest alignment and endianness.

That seems a rather poor design choice, as many architectures don't have an 
atomic increment instruction. Oh well.

Paul




Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-08 Thread Jamie Lokier
Paul Brook wrote:
  virtio makes things a bit trickier though.  There's a shared ring queue
  between the host and guest.  The ring queue is lock-less and depends on
  the ability to atomically increment ring queue indices to be SMP safe.
  Using a copy-API wouldn't be a problem for QEMU since the host and guest
  are always running in lock-step.  A copy API is actually needed to deal
  with differing host/guest alignment and endianness.
 
 That seems a rather poor design choice, as many architectures don't have an 
 atomic increment instruction. Oh well.

Most have compare-and-swap or load-locked/store-conditional
instructions, though, which can be used to implement atomic increment.

-- Jamie




Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-08 Thread Blue Swirl
On 12/8/07, Paul Brook [EMAIL PROTECTED] wrote:
 On Saturday 08 December 2007, Jamie Lokier wrote:
  Paul Brook wrote:
virtio makes things a bit trickier though.  There's a shared ring queue
between the host and guest.  The ring queue is lock-less and depends on
the ability to atomically increment ring queue indices to be SMP safe.
Using a copy-API wouldn't be a problem for QEMU since the host and
guest are always running in lock-step.  A copy API is actually needed
to deal with differing host/guest alignment and endianness.
  
   That seems a rather poor design choice, as many architectures don't have
   an atomic increment instruction. Oh well.
 
  Most have compare-and-swap or load-locked/store-conditional
  instructions, though, which can be used to implement atomic increment.

 Yes, but your hardware implementation has to make sure it interacts with
 those properly. It's certainly possible to implement lockless lists without
 requiring atomic increment. Most high-end hardware manages it and that
 doesn't even have coherent DMA.

If we start adding locks for IO, could we use the same locking model
more widely or make it generic so that it would support a SMP host as
well?




Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-08 Thread Jamie Lokier
Paul Brook wrote:
virtio makes things a bit trickier though.  There's a shared ring queue
between the host and guest.  The ring queue is lock-less and depends on
the ability to atomically increment ring queue indices to be SMP safe.
Using a copy-API wouldn't be a problem for QEMU since the host and
guest are always running in lock-step.  A copy API is actually needed
to deal with differing host/guest alignment and endianness.
  
   That seems a rather poor design choice, as many architectures don't have
   an atomic increment instruction. Oh well.
 
  Most have compare-and-swap or load-locked/store-conditional
  instructions, though, which can be used to implement atomic increment.
 
 Yes, but your hardware implementation has to make sure it interacts with 
 those properly. It's certainly possible to implement lockless lists without 
 requiring atomic increment. Most high-end hardware manages it and that 
 doesn't even have coherent DMA.

I'm inclined to agree that a lockless structure (including not using
an atomic op) for the most commonly used paths, such as appending to a
ring, would be better.  It increases the potential portability for
emulation/virtualisation across all architectures now and in the
future, and it would almost certainly perform better on architectures
other than x86.

However, occasionally you need to enter the host for synchronisation.
E.g. when a ring is empty/full.

In that case, sometimes using atomic ops in the way that futexes are
used in Linux/Glibc can optimise the details of those transitions, but
it would be best if they were optional optimisations, for
cross-platform, cross-architure portability.

There's a particularly awkward problem when taking an x86 atomic op in
the guest, and generating code on the non-x86 host which doesn't have
any equivalent op.  What's the right thing to do?

Since virtio is driven by virtualisation projects rather than
emulation, is it possible this hasn't been thought of at all, making
virtio unusable for cross-architecture emulation?  That would be
really unfortunate.

-- Jamie




Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-08 Thread Anthony Liguori

Paul Brook wrote:

virtio makes things a bit trickier though.  There's a shared ring queue
between the host and guest.  The ring queue is lock-less and depends on
the ability to atomically increment ring queue indices to be SMP safe.
Using a copy-API wouldn't be a problem for QEMU since the host and guest
are always running in lock-step.  A copy API is actually needed to deal
with differing host/guest alignment and endianness.



That seems a rather poor design choice, as many architectures don't have an 
atomic increment instruction. Oh well.
  


Sorry, I should have been more clear.  An atomic increment isn't needed, 
just an atomic store.


Regards,

Anthony Liguori


Paul



  






Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-08 Thread Anthony Liguori

Jamie Lokier wrote:

Paul Brook wrote:
  

virtio makes things a bit trickier though.  There's a shared ring queue
between the host and guest.  The ring queue is lock-less and depends on
the ability to atomically increment ring queue indices to be SMP safe.
Using a copy-API wouldn't be a problem for QEMU since the host and
guest are always running in lock-step.  A copy API is actually needed
to deal with differing host/guest alignment and endianness.
  

That seems a rather poor design choice, as many architectures don't have
an atomic increment instruction. Oh well.


Most have compare-and-swap or load-locked/store-conditional
instructions, though, which can be used to implement atomic increment.
  
Yes, but your hardware implementation has to make sure it interacts with 
those properly. It's certainly possible to implement lockless lists without 
requiring atomic increment. Most high-end hardware manages it and that 
doesn't even have coherent DMA.



I'm inclined to agree that a lockless structure (including not using
an atomic op) for the most commonly used paths, such as appending to a
ring, would be better.  It increases the potential portability for
emulation/virtualisation across all architectures now and in the
future, and it would almost certainly perform better on architectures
other than x86.

However, occasionally you need to enter the host for synchronisation.
E.g. when a ring is empty/full.

In that case, sometimes using atomic ops in the way that futexes are
used in Linux/Glibc can optimise the details of those transitions, but
it would be best if they were optional optimisations, for
cross-platform, cross-architure portability.

There's a particularly awkward problem when taking an x86 atomic op in
the guest, and generating code on the non-x86 host which doesn't have
any equivalent op.  What's the right thing to do?

Since virtio is driven by virtualisation projects rather than
emulation, is it possible this hasn't been thought of at all, making
virtio unusable for cross-architecture emulation?  That would be
really unfortunate.
  


virtio has been designed for virtualization, yes.  There aren't really 
restrictions that prevent it's use when doing cross-architecture 
emulation (yet) with QEMU.


If QEMU ever got true SMP support, then virtio would not work as it 
requires 16-bit atomic writes which AFAIK is not possible on a number of 
non-x86 architectures.


Regards,

Anthony Liguori


-- Jamie



  






Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-06 Thread Jamie Lokier
Anthony Liguori wrote:
 virtio makes things a bit trickier though.  There's a shared ring queue 
 between the host and guest.

Does this work when the host and guest are different architectures, or
different word sizes (x86/x86_64)?

-- Jamie




Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-05 Thread Paul Brook
  Actually according to qemu's standard, one should use
  cpu_physical_memory_write/ cpu_physical_memory_read functions.
  This is true also for reading the ring values.

 Yes, and unfortunately, cpu_physical_memory_{read,write} are copy
 interfaces.  We really don't want that for high speed I/O.

I really don't like doing direct access to guest ram without implementing a 
proper API for zero-copy/scatter-gather access. There was a list thread about 
this not so long ago.

Paul




[Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-05 Thread Anthony Liguori

Dor Laor wrote:

Anthony Liguori wrote:
Index: qemu/hw/virtio-net.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ qemu/hw/virtio-net.c2007-12-04 14:17:37.0 -0600

+
+static void virtio_net_receive(void *opaque, const uint8_t *buf, int size)
+{
+VirtIONet *n = opaque;
+VirtQueueElement elem;
+struct virtio_net_hdr *hdr;
+int offset, i;
+
+/* FIXME: the drivers really need to set their status better */
+if (n-rx_vq-vring.avail == NULL) {
+   n-can_receive = 0;
+   return;
+}
+
+if (virtqueue_pop(n-rx_vq, elem) == 0) {
+   /* wait until the guest adds some rx bufs */
+   n-can_receive = 0;
+   return;
+}
+
+hdr = (void *)elem.in_sg[0].iov_base;
+hdr-flags = 0;
+hdr-gso_type = VIRTIO_NET_HDR_GSO_NONE;
+
+/* copy in packet.  ugh */
+offset = 0;
+i = 1;
+while (offset  size  i  elem.in_num) {
+   int len = MIN(elem.in_sg[i].iov_len, size - offset);
+   memcpy(elem.in_sg[i].iov_base, buf + offset, len);



Actually according to qemu's standard, one should use cpu_physical_memory_write/
cpu_physical_memory_read functions.
This is true also for reading the ring values.
  


Yes, and unfortunately, cpu_physical_memory_{read,write} are copy 
interfaces.  We really don't want that for high speed I/O.


The only down-sides of not using cpu_physical_memory_{read,write} is 
that writes aren't dispatched to MMIO functions and dirty updating isn't 
handled automatically.  I do need to go through an audit the dirty 
updating.  It may be possible to do a 1-time check of MMIO memory and 
then have a fast and slow path.  I'm not sure how much it matter honestly.


Regards,

Anthony Liguori




+   offset += len;
+   i++;
+}
+
+/* signal other side */
+virtqueue_push(n-rx_vq, elem, sizeof(*hdr) + offset);
+virtio_notify(n-vdev, n-rx_vq);
+}
+
+/* TX */
+static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIONet *n = to_virtio_net(vdev);
+VirtQueueElement elem;
+
+while (virtqueue_pop(vq, elem)) {
+   int i;
+   size_t len = 0;
+
+   /* ignore the header for now */
+   for (i = 1; i  elem.out_num; i++) {
+   qemu_send_packet(n-vc, elem.out_sg[i].iov_base,
+elem.out_sg[i].iov_len);
+   len += elem.out_sg[i].iov_len;
+   }
+
+   virtqueue_push(vq, elem, sizeof(struct virtio_net_hdr) + len);
+   virtio_notify(n-vdev, vq);
+}


The virtio_notify should be left out of the while loop to minimize 
irq injection.



+}
+
+void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
+{
+VirtIONet *n;
+
+n = (VirtIONet *)virtio_init_pci(bus, virtio-net, 6900, 0x1000,
+0, VIRTIO_ID_NET,
+0x02, 0x00, 0x00,
+6, sizeof(VirtIONet));
+
+n-vdev.update_config = virtio_net_update_config;
+n-vdev.get_features = virtio_net_get_features;
+n-rx_vq = virtio_add_queue(n-vdev, 512, virtio_net_handle_rx);
+n-tx_vq = virtio_add_queue(n-vdev, 128, virtio_net_handle_tx);
+n-can_receive = 0;
+memcpy(n-mac, nd-macaddr, 6);
+n-vc = qemu_new_vlan_client(nd-vlan, virtio_net_receive,
+virtio_net_can_receive, n);
+
+return n-vdev;
+}
Index: qemu/hw/pc.h
===
--- qemu.orig/hw/pc.h   2007-12-04 14:15:20.0 -0600
+++ qemu/hw/pc.h2007-12-04 14:43:57.0 -0600
@@ -142,4 +142,9 @@

 void isa_ne2000_init(int base, qemu_irq irq, NICInfo *nd);

+/* virtio-net.c */
+
+void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn);
+
+
 #endif
  






Re: [Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-05 Thread Anthony Liguori

Paul Brook wrote:

Actually according to qemu's standard, one should use
cpu_physical_memory_write/ cpu_physical_memory_read functions.
This is true also for reading the ring values.
  

Yes, and unfortunately, cpu_physical_memory_{read,write} are copy
interfaces.  We really don't want that for high speed I/O.



I really don't like doing direct access to guest ram without implementing a 
proper API for zero-copy/scatter-gather access. There was a list thread about 
this not so long ago.
  


I agree that we need a proper API for sg ram access.  I'm going to look 
into that real soon since it's necessary to optimize the network/disk 
transports.


virtio makes things a bit trickier though.  There's a shared ring queue 
between the host and guest.  The ring queue is lock-less and depends on 
the ability to atomically increment ring queue indices to be SMP safe.  
Using a copy-API wouldn't be a problem for QEMU since the host and guest 
are always running in lock-step.  A copy API is actually needed to deal 
with differing host/guest alignment and endianness.


Once you introduce KVM though, this is no longer true since KVM supports 
true SMP.  The solution may be to implement some sort of 
atomic_increment function and then have that use a if (kvm) guard to do 
a direct access verses a copy.


Regards,

Anthony Liguori


Paul
  






[Qemu-devel] Re: [PATCH 2/3] virtio network device

2007-12-04 Thread Dor Laor

Anthony Liguori wrote:

Index: qemu/hw/virtio-net.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ qemu/hw/virtio-net.c2007-12-04 14:17:37.0 -0600

+
+static void virtio_net_receive(void *opaque, const uint8_t *buf, int size)
+{
+VirtIONet *n = opaque;
+VirtQueueElement elem;
+struct virtio_net_hdr *hdr;
+int offset, i;
+
+/* FIXME: the drivers really need to set their status better */
+if (n-rx_vq-vring.avail == NULL) {
+   n-can_receive = 0;
+   return;
+}
+
+if (virtqueue_pop(n-rx_vq, elem) == 0) {
+   /* wait until the guest adds some rx bufs */
+   n-can_receive = 0;
+   return;
+}
+
+hdr = (void *)elem.in_sg[0].iov_base;
+hdr-flags = 0;
+hdr-gso_type = VIRTIO_NET_HDR_GSO_NONE;
+
+/* copy in packet.  ugh */
+offset = 0;
+i = 1;
+while (offset  size  i  elem.in_num) {
+   int len = MIN(elem.in_sg[i].iov_len, size - offset);
+   memcpy(elem.in_sg[i].iov_base, buf + offset, len);



Actually according to qemu's standard, one should use cpu_physical_memory_write/
cpu_physical_memory_read functions.
This is true also for reading the ring values.




+   offset += len;
+   i++;
+}
+
+/* signal other side */
+virtqueue_push(n-rx_vq, elem, sizeof(*hdr) + offset);
+virtio_notify(n-vdev, n-rx_vq);
+}
+
+/* TX */
+static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIONet *n = to_virtio_net(vdev);
+VirtQueueElement elem;
+
+while (virtqueue_pop(vq, elem)) {
+   int i;
+   size_t len = 0;
+
+   /* ignore the header for now */
+   for (i = 1; i  elem.out_num; i++) {
+   qemu_send_packet(n-vc, elem.out_sg[i].iov_base,
+elem.out_sg[i].iov_len);
+   len += elem.out_sg[i].iov_len;
+   }
+
+   virtqueue_push(vq, elem, sizeof(struct virtio_net_hdr) + len);
+   virtio_notify(n-vdev, vq);
+}


The virtio_notify should be left out of the while loop to minimize 
irq injection.



+}
+
+void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
+{
+VirtIONet *n;
+
+n = (VirtIONet *)virtio_init_pci(bus, virtio-net, 6900, 0x1000,
+0, VIRTIO_ID_NET,
+0x02, 0x00, 0x00,
+6, sizeof(VirtIONet));
+
+n-vdev.update_config = virtio_net_update_config;
+n-vdev.get_features = virtio_net_get_features;
+n-rx_vq = virtio_add_queue(n-vdev, 512, virtio_net_handle_rx);
+n-tx_vq = virtio_add_queue(n-vdev, 128, virtio_net_handle_tx);
+n-can_receive = 0;
+memcpy(n-mac, nd-macaddr, 6);
+n-vc = qemu_new_vlan_client(nd-vlan, virtio_net_receive,
+virtio_net_can_receive, n);
+
+return n-vdev;
+}
Index: qemu/hw/pc.h
===
--- qemu.orig/hw/pc.h   2007-12-04 14:15:20.0 -0600
+++ qemu/hw/pc.h2007-12-04 14:43:57.0 -0600
@@ -142,4 +142,9 @@

void isa_ne2000_init(int base, qemu_irq irq, NICInfo *nd);

+/* virtio-net.c */
+
+void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn);
+
+
#endif