Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-07-07 Thread Kevin O'Connor
I only recently saw this email.

On Thu, Jun 06, 2013 at 06:10:12PM +0300, Gleb Natapov wrote:
 On Thu, Jun 06, 2013 at 05:06:32PM +0200, Gerd Hoffmann wrote:
  For seabios itself this isn't a big issue, see pci_{readl,writel} in
  src/pci.c.  When called in 16bit mode it goes into 32bit mode
  temporarily, just for accessing the mmio register.  ahci driver uses it,
  xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers
  in seabios can do the same.
  
 Isn't this approach broken? How can SeaBIOS be sure it restores real
 mode registers to exactly same state they were before entering 32bit
 mode?

You are correct - SeaBIOS can't fully restore the hidden segment
registers.  So, in a way it is broken.

In practice, it seems to work on modern bootloaders (eg, ntldr, grub,
lilo).  It definitely doesn't work with EMM386 (old DOS stuff), but
does seem to work okay with FreeDOS as long as one doesn't run EMM386.

The AHCI code uses this 32bit/16bit trampoline because it would not be
possible to support AHCI otherwise.  I haven't seen any complaints of
failures with the AHCI code - probably because people using AHCI are
using modern guests.

I explored this a bit some time back and the only way I could think of
to reliably restore the 16bit registers would be via SMM.
Unfortunately, using SMM introduces a whole host of complexity and
problems.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
 Gleb Natapov g...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
  H. Peter Anvin h...@zytor.com writes:
  
   On 06/05/2013 03:08 PM, Anthony Liguori wrote:
  
   Definitely an option.  However, we want to be able to boot from native
   devices, too, so having an I/O BAR (which would not be used by the OS
   driver) should still at the very least be an option.
   
   What makes it so difficult to work with an MMIO bar for PCI-e?
   
   With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
   forward.  Is there something special about PCI-e here?
   
  
   It's not tracking allocation.  It is that accessing memory above 1 MiB
   is incredibly painful in the BIOS environment, which basically means
   MMIO is inaccessible.
  
  Oh, you mean in real mode.
  
  SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
  There are loads of ASSERT32FLAT()s in the code to make sure of this.
  
  Well, not exactly. Initialization is done in 32bit, but disk
  reads/writes are done in 16bit mode since it should work from int13
  interrupt handler. The only way I know to access MMIO bars from 16 bit
  is to use SMM which we do not have in KVM.
 
 Ah, if it's just the dataplane operations then there's another solution.
 
 We can introduce a virtqueue flag that asks the backend to poll for new
 requests.  Then SeaBIOS can add the request to the queue and not worry
 about kicking or reading the ISR.

This will pin a host CPU.
If we do something timer based it will likely
both increase host CPU utilization and slow device down.

If we didn't care about performance at all we could
do config cycles for signalling, which is much
more elegant than polling in host, but I don't think
that's the case.

 
 SeaBIOS is polling for completion anyway.

I think that's different because a disk will normally respond
quickly. So it polls a bit, but then it stops as
there are no outstanding requests.

-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Gleb Natapov
On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
 On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
  Gleb Natapov g...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
   H. Peter Anvin h...@zytor.com writes:
   
On 06/05/2013 03:08 PM, Anthony Liguori wrote:
   
Definitely an option.  However, we want to be able to boot from 
native
devices, too, so having an I/O BAR (which would not be used by the OS
driver) should still at the very least be an option.

What makes it so difficult to work with an MMIO bar for PCI-e?

With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
straight
forward.  Is there something special about PCI-e here?

   
It's not tracking allocation.  It is that accessing memory above 1 MiB
is incredibly painful in the BIOS environment, which basically means
MMIO is inaccessible.
   
   Oh, you mean in real mode.
   
   SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
   There are loads of ASSERT32FLAT()s in the code to make sure of this.
   
   Well, not exactly. Initialization is done in 32bit, but disk
   reads/writes are done in 16bit mode since it should work from int13
   interrupt handler. The only way I know to access MMIO bars from 16 bit
   is to use SMM which we do not have in KVM.
  
  Ah, if it's just the dataplane operations then there's another solution.
  
  We can introduce a virtqueue flag that asks the backend to poll for new
  requests.  Then SeaBIOS can add the request to the queue and not worry
  about kicking or reading the ISR.
 
 This will pin a host CPU.
 If we do something timer based it will likely
 both increase host CPU utilization and slow device down.
 
 If we didn't care about performance at all we could
 do config cycles for signalling, which is much
 more elegant than polling in host, but I don't think
 that's the case.
 
I wouldn't call BIOS int13 interface performance critical.

  
  SeaBIOS is polling for completion anyway.
 
 I think that's different because a disk will normally respond
 quickly. So it polls a bit, but then it stops as
 there are no outstanding requests.
 
 -- 
 MST

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
 On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
  On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
   Gleb Natapov g...@redhat.com writes:
   
On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot from 
 native
 devices, too, so having an I/O BAR (which would not be used by the 
 OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
 straight
 forward.  Is there something special about PCI-e here?
 

 It's not tracking allocation.  It is that accessing memory above 1 
 MiB
 is incredibly painful in the BIOS environment, which basically means
 MMIO is inaccessible.

Oh, you mean in real mode.

SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
There are loads of ASSERT32FLAT()s in the code to make sure of this.

Well, not exactly. Initialization is done in 32bit, but disk
reads/writes are done in 16bit mode since it should work from int13
interrupt handler. The only way I know to access MMIO bars from 16 bit
is to use SMM which we do not have in KVM.
   
   Ah, if it's just the dataplane operations then there's another solution.
   
   We can introduce a virtqueue flag that asks the backend to poll for new
   requests.  Then SeaBIOS can add the request to the queue and not worry
   about kicking or reading the ISR.
  
  This will pin a host CPU.
  If we do something timer based it will likely
  both increase host CPU utilization and slow device down.
  
  If we didn't care about performance at all we could
  do config cycles for signalling, which is much
  more elegant than polling in host, but I don't think
  that's the case.
  
 I wouldn't call BIOS int13 interface performance critical.

So the plan always was to
- add an MMIO BAR
- add a register for pci-config based access to devices

hpa felt performance does matter there but didn't clarify why ...

   
   SeaBIOS is polling for completion anyway.
  
  I think that's different because a disk will normally respond
  quickly. So it polls a bit, but then it stops as
  there are no outstanding requests.
  
  -- 
  MST
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Gleb Natapov
On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
   On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
Gleb Natapov g...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to boot from 
  native
  devices, too, so having an I/O BAR (which would not be used by 
  the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
  straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory above 1 
  MiB
  is incredibly painful in the BIOS environment, which basically 
  means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
 layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.
 
 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from int13
 interrupt handler. The only way I know to access MMIO bars from 16 bit
 is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another solution.

We can introduce a virtqueue flag that asks the backend to poll for new
requests.  Then SeaBIOS can add the request to the queue and not worry
about kicking or reading the ISR.
   
   This will pin a host CPU.
   If we do something timer based it will likely
   both increase host CPU utilization and slow device down.
   
   If we didn't care about performance at all we could
   do config cycles for signalling, which is much
   more elegant than polling in host, but I don't think
   that's the case.
   
  I wouldn't call BIOS int13 interface performance critical.
 
 So the plan always was to
 - add an MMIO BAR
 - add a register for pci-config based access to devices
 
 hpa felt performance does matter there but didn't clarify why ...
 
You do not what to make it too slow obviously, this is interface that is
used to load OS during boot.


SeaBIOS is polling for completion anyway.
   
   I think that's different because a disk will normally respond
   quickly. So it polls a bit, but then it stops as
   there are no outstanding requests.
   
   -- 
   MST
  
  --
  Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
   On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
Gleb Natapov g...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to boot from 
  native
  devices, too, so having an I/O BAR (which would not be used by 
  the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
  straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory above 1 
  MiB
  is incredibly painful in the BIOS environment, which basically 
  means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
 layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.
 
 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from int13
 interrupt handler. The only way I know to access MMIO bars from 16 bit
 is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another solution.

We can introduce a virtqueue flag that asks the backend to poll for new
requests.  Then SeaBIOS can add the request to the queue and not worry
about kicking or reading the ISR.
   
   This will pin a host CPU.
   If we do something timer based it will likely
   both increase host CPU utilization and slow device down.
   
   If we didn't care about performance at all we could
   do config cycles for signalling, which is much
   more elegant than polling in host, but I don't think
   that's the case.
   
  I wouldn't call BIOS int13 interface performance critical.
 
 So the plan always was to
 - add an MMIO BAR
 - add a register for pci-config based access to devices
 
 hpa felt performance does matter there but didn't clarify why ...

Also

gleb mst, well the question is if it is safe to call int13 in the
middle of pci bus enumeration/configuration
gleb mst, and int13 predates PCI, so how knows


SeaBIOS is polling for completion anyway.
   
   I think that's different because a disk will normally respond
   quickly. So it polls a bit, but then it stops as
   there are no outstanding requests.
   
   -- 
   MST
  
  --
  Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote:
 On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
  On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
   On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
 Gleb Natapov g...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
  H. Peter Anvin h...@zytor.com writes:
  
   On 06/05/2013 03:08 PM, Anthony Liguori wrote:
  
   Definitely an option.  However, we want to be able to boot 
   from native
   devices, too, so having an I/O BAR (which would not be used by 
   the OS
   driver) should still at the very least be an option.
   
   What makes it so difficult to work with an MMIO bar for PCI-e?
   
   With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
   straight
   forward.  Is there something special about PCI-e here?
   
  
   It's not tracking allocation.  It is that accessing memory above 
   1 MiB
   is incredibly painful in the BIOS environment, which basically 
   means
   MMIO is inaccessible.
  
  Oh, you mean in real mode.
  
  SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
  layout.
  There are loads of ASSERT32FLAT()s in the code to make sure of 
  this.
  
  Well, not exactly. Initialization is done in 32bit, but disk
  reads/writes are done in 16bit mode since it should work from int13
  interrupt handler. The only way I know to access MMIO bars from 16 
  bit
  is to use SMM which we do not have in KVM.
 
 Ah, if it's just the dataplane operations then there's another 
 solution.
 
 We can introduce a virtqueue flag that asks the backend to poll for 
 new
 requests.  Then SeaBIOS can add the request to the queue and not worry
 about kicking or reading the ISR.

This will pin a host CPU.
If we do something timer based it will likely
both increase host CPU utilization and slow device down.

If we didn't care about performance at all we could
do config cycles for signalling, which is much
more elegant than polling in host, but I don't think
that's the case.

   I wouldn't call BIOS int13 interface performance critical.
  
  So the plan always was to
  - add an MMIO BAR
  - add a register for pci-config based access to devices
  
  hpa felt performance does matter there but didn't clarify why ...
  
 You do not what to make it too slow obviously, this is interface that is
 used to load OS during boot.

And possibly installation?

 
 SeaBIOS is polling for completion anyway.

I think that's different because a disk will normally respond
quickly. So it polls a bit, but then it stops as
there are no outstanding requests.

-- 
MST
   
   --
 Gleb.
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Gleb Natapov
On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
   On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
 On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
  Gleb Natapov g...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
   H. Peter Anvin h...@zytor.com writes:
   
On 06/05/2013 03:08 PM, Anthony Liguori wrote:
   
Definitely an option.  However, we want to be able to boot 
from native
devices, too, so having an I/O BAR (which would not be used 
by the OS
driver) should still at the very least be an option.

What makes it so difficult to work with an MMIO bar for PCI-e?

With legacy PCI, tracking allocation of MMIO vs. PIO is 
pretty straight
forward.  Is there something special about PCI-e here?

   
It's not tracking allocation.  It is that accessing memory 
above 1 MiB
is incredibly painful in the BIOS environment, which basically 
means
MMIO is inaccessible.
   
   Oh, you mean in real mode.
   
   SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
   layout.
   There are loads of ASSERT32FLAT()s in the code to make sure of 
   this.
   
   Well, not exactly. Initialization is done in 32bit, but disk
   reads/writes are done in 16bit mode since it should work from 
   int13
   interrupt handler. The only way I know to access MMIO bars from 
   16 bit
   is to use SMM which we do not have in KVM.
  
  Ah, if it's just the dataplane operations then there's another 
  solution.
  
  We can introduce a virtqueue flag that asks the backend to poll for 
  new
  requests.  Then SeaBIOS can add the request to the queue and not 
  worry
  about kicking or reading the ISR.
 
 This will pin a host CPU.
 If we do something timer based it will likely
 both increase host CPU utilization and slow device down.
 
 If we didn't care about performance at all we could
 do config cycles for signalling, which is much
 more elegant than polling in host, but I don't think
 that's the case.
 
I wouldn't call BIOS int13 interface performance critical.
   
   So the plan always was to
   - add an MMIO BAR
   - add a register for pci-config based access to devices
   
   hpa felt performance does matter there but didn't clarify why ...
   
  You do not what to make it too slow obviously, this is interface that is
  used to load OS during boot.
 
 And possibly installation?
 
Only the stage that reads files from CDROM. IIRC actual installation
runs with native drivers. This is why Windows asks you to provide floppy
with a driver at very early stage of installation.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 11:22:37AM +0300, Gleb Natapov wrote:
 On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote:
  On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote:
   On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
 On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
  On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
   Gleb Natapov g...@redhat.com writes:
   
On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot 
 from native
 devices, too, so having an I/O BAR (which would not be 
 used by the OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for 
 PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is 
 pretty straight
 forward.  Is there something special about PCI-e here?
 

 It's not tracking allocation.  It is that accessing memory 
 above 1 MiB
 is incredibly painful in the BIOS environment, which 
 basically means
 MMIO is inaccessible.

Oh, you mean in real mode.

SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
layout.
There are loads of ASSERT32FLAT()s in the code to make sure of 
this.

Well, not exactly. Initialization is done in 32bit, but disk
reads/writes are done in 16bit mode since it should work from 
int13
interrupt handler. The only way I know to access MMIO bars from 
16 bit
is to use SMM which we do not have in KVM.
   
   Ah, if it's just the dataplane operations then there's another 
   solution.
   
   We can introduce a virtqueue flag that asks the backend to poll 
   for new
   requests.  Then SeaBIOS can add the request to the queue and not 
   worry
   about kicking or reading the ISR.
  
  This will pin a host CPU.
  If we do something timer based it will likely
  both increase host CPU utilization and slow device down.
  
  If we didn't care about performance at all we could
  do config cycles for signalling, which is much
  more elegant than polling in host, but I don't think
  that's the case.
  
 I wouldn't call BIOS int13 interface performance critical.

So the plan always was to
- add an MMIO BAR
- add a register for pci-config based access to devices

hpa felt performance does matter there but didn't clarify why ...

   You do not what to make it too slow obviously, this is interface that is
   used to load OS during boot.
  
  And possibly installation?
  
 Only the stage that reads files from CDROM. IIRC actual installation
 runs with native drivers. This is why Windows asks you to provide floppy
 with a driver at very early stage of installation.

Have any numbers to tell us how much time is spent there?
E.g. if it's slowed down by a factor of 2, is it a problem?
How about a factor of 10?

 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Gleb Natapov
On Tue, Jun 11, 2013 at 11:30:11AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 11:22:37AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote:
   On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote:
On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
   On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
Gleb Natapov g...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori 
 wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to 
  boot from native
  devices, too, so having an I/O BAR (which would not be 
  used by the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for 
  PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is 
  pretty straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory 
  above 1 MiB
  is incredibly painful in the BIOS environment, which 
  basically means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat 
 memory layout.
 There are loads of ASSERT32FLAT()s in the code to make sure 
 of this.
 
 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from 
 int13
 interrupt handler. The only way I know to access MMIO bars 
 from 16 bit
 is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another 
solution.

We can introduce a virtqueue flag that asks the backend to poll 
for new
requests.  Then SeaBIOS can add the request to the queue and 
not worry
about kicking or reading the ISR.
   
   This will pin a host CPU.
   If we do something timer based it will likely
   both increase host CPU utilization and slow device down.
   
   If we didn't care about performance at all we could
   do config cycles for signalling, which is much
   more elegant than polling in host, but I don't think
   that's the case.
   
  I wouldn't call BIOS int13 interface performance critical.
 
 So the plan always was to
 - add an MMIO BAR
 - add a register for pci-config based access to devices
 
 hpa felt performance does matter there but didn't clarify why ...
 
You do not what to make it too slow obviously, this is interface that is
used to load OS during boot.
   
   And possibly installation?
   
  Only the stage that reads files from CDROM. IIRC actual installation
  runs with native drivers. This is why Windows asks you to provide floppy
  with a driver at very early stage of installation.
 
 Have any numbers to tell us how much time is spent there?
 E.g. if it's slowed down by a factor of 2, is it a problem?
 How about a factor of 10?
 
No I do not.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-07 Thread Peter Maydell
On 6 June 2013 15:59, Anthony Liguori aligu...@us.ibm.com wrote:
 We would still use virtio-pci for existing devices.  Only new devices
 would use virtio-pcie.

Surely you'd want to support both for any new device, because
(a) transport is orthogonal to backend functionality
(b) not all existing machine models have pci-e ?

Or am I misunderstanding?

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Gleb Natapov
On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to boot from native
  devices, too, so having an I/O BAR (which would not be used by the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory above 1 MiB
  is incredibly painful in the BIOS environment, which basically means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.
 
Well, not exactly. Initialization is done in 32bit, but disk
reads/writes are done in 16bit mode since it should work from int13
interrupt handler. The only way I know to access MMIO bars from 16 bit
is to use SMM which we do not have in KVM.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Michael S. Tsirkin
On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
  Michael S. Tsirkin m...@redhat.com writes:
   On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
   Rusty Russell ru...@rustcorp.com.au writes:
   
Anthony Liguori aligu...@us.ibm.com writes:
Forcing a guest driver change is a really big
deal and I see no reason to do that unless there's a compelling 
reason
to.
   
So we're stuck with the 1.0 config layout for a very long time.
   
We definitely must not force a guest change.  The explicit aim of the
standard is that legacy and 1.0 be backward compatible.  One
deliverable is a document detailing how this is done (effectively a
summary of changes between what we have and 1.0).
   
   If 2.0 is fully backwards compatible, great.  It seems like such a
   difference that that would be impossible but I need to investigate
   further.
   
   Regards,
   
   Anthony Liguori
  
   If you look at my patches you'll see how it works.
   Basically old guests use BAR0 new ones don't, so
   it's easy: BAR0 access means legacy guest.
   Only started testing but things seem to work
   fine with old guests so far.
  
   I think we need a spec, not just driver code.
  
   Rusty what's the plan? Want me to write it?
  
  We need both, of course, but the spec work will happen in the OASIS WG.
  A draft is good, but let's not commit anything to upstream QEMU until we
  get the spec finalized.  And that is proposed to be late this year.
 
  Well that would be quite sad really.
  
  This means we can't make virtio a spec compliant pci express device,
  and we can't add any more feature bits, so no
  flexible buffer optimizations for virtio net.
 
  There are probably more projects that will be blocked.
 
  So how about we keep extending legacy layout for a bit longer:
  - add a way to access device with MMIO
  - use feature bit 31 to signal 64 bit features
(and shift device config accordingly)
 
 By my count, net still has 7 feature bits left, so I don't think the
 feature bits are likely to be a limitation in the next 6 months?

Actually I count 5 net specific ones:

3, 4 and then 25, 26, 27

Unless you count 31 even though it's a generic transport bit?

That's still 6 ...

 MMIO is a bigger problem.  Linux guests are happy with it: does it break
 the Windows drivers?
 
 Thanks,
 Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread H. Peter Anvin
On 06/05/2013 11:34 PM, Gleb Natapov wrote:

 SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.

 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from int13
 interrupt handler. The only way I know to access MMIO bars from 16 bit
 is to use SMM which we do not have in KVM.
 

In some ways it is even worse for PXE, since PXE is defined to work from
16-bit protected mode... but the mechanism to request a segment mapping
for the high memory area doesn't actually work.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Anthony Liguori

Hi Rusty,

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.

 Now you have a different compatibility problem; how do you know the
 guest supports the new virtio-pcie net?

We don't care.

We would still use virtio-pci for existing devices.  Only new devices
would use virtio-pcie.

 If you put a virtio-pci card behind a PCI-e bridge today, it's not
 compliant, but AFAICT it will Just Work.  (Modulo the 16-dev limit).

I believe you can put it in legacy mode and then there isn't the 16-dev
limit.  I believe the only advantage of putting it in native mode is
that then you can do native hotplug (as opposed to ACPI hotplug).

So sticking with virtio-pci seems reasonable to me.

 I've been assuming we'd avoid a flag day change; that devices would
 look like existing virtio-pci with capabilities indicating the new
 config layout.

I don't think that's feasible.  Maybe 5 or 10 years from now, we switch
the default adapter to virtio-pcie.

 I think 4 is the best path forward.  It's better for users (guests
 continue to work as they always have).  There's less confusion about
 enabling PCI-e support--you must ask for the virtio-pcie variant and you
 must have a virtio-pcie driver.  It's easy to explain.

 Removing both forward and backward compatibility is easy to explain, but
 I think it'll be harder to deploy.  This is your area though, so perhaps
 I'm wrong.

My concern is that it's not real backwards compatibility.

 It also maps to what regular hardware does.  I highly doubt that there
 are any real PCI cards that made the shift from PCI to PCI-e without
 bumping at least a revision ID.

 Noone expected the new cards to Just Work with old OSes: a new machine
 meant a new OS and new drivers.  Hardware vendors like that.

Yup.

 Since virtualization often involves legacy, our priorities might be
 different.

So realistically, I think if we introduce virtio-pcie with a different
vendor ID, it will be adopted fairly quickly.  The drivers will show up
in distros quickly and get backported.

New devices can be limited to supporting virtio-pcie and we'll certainly
provide a way to use old devices with virtio-pcie too.  But for
practical reasons, I think we have to continue using virtio-pci by
default.

Regards,

Anthony Liguori


 Cheers,
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Anthony Liguori
Gleb Natapov g...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to boot from native
  devices, too, so having an I/O BAR (which would not be used by the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory above 1 MiB
  is incredibly painful in the BIOS environment, which basically means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.
 
 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from int13
 interrupt handler. The only way I know to access MMIO bars from 16 bit
 is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another solution.

We can introduce a virtqueue flag that asks the backend to poll for new
requests.  Then SeaBIOS can add the request to the queue and not worry
about kicking or reading the ISR.

SeaBIOS is polling for completion anyway.

Regards,

Anthony Liguori


 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Gerd Hoffmann
On 06/06/13 08:34, Gleb Natapov wrote:
 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:

 Oh, you mean in real mode.

 SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.

 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from int13
 interrupt handler.

Exactly.  It's only the initialization code which has ASSERt32FLAT()
all over the place.  Which actually is the majority of the code in most
cases as all the hardware detection and initialization code is there.
But kicking I/O requests must work from 16bit mode too.

 The only way I know to access MMIO bars from 16 bit
 is to use SMM which we do not have in KVM.

For seabios itself this isn't a big issue, see pci_{readl,writel} in
src/pci.c.  When called in 16bit mode it goes into 32bit mode
temporarily, just for accessing the mmio register.  ahci driver uses it,
xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers
in seabios can do the same.

But as hpa mentioned it will be more tricky for option roms (aka
virtio-net).

cheers,
  Gerd


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Gleb Natapov
On Thu, Jun 06, 2013 at 05:06:32PM +0200, Gerd Hoffmann wrote:
 On 06/06/13 08:34, Gleb Natapov wrote:
  On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
 
  Oh, you mean in real mode.
 
  SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
  There are loads of ASSERT32FLAT()s in the code to make sure of this.
 
  Well, not exactly. Initialization is done in 32bit, but disk
  reads/writes are done in 16bit mode since it should work from int13
  interrupt handler.
 
 Exactly.  It's only the initialization code which has ASSERt32FLAT()
 all over the place.  Which actually is the majority of the code in most
 cases as all the hardware detection and initialization code is there.
 But kicking I/O requests must work from 16bit mode too.
 
  The only way I know to access MMIO bars from 16 bit
  is to use SMM which we do not have in KVM.
 
 For seabios itself this isn't a big issue, see pci_{readl,writel} in
 src/pci.c.  When called in 16bit mode it goes into 32bit mode
 temporarily, just for accessing the mmio register.  ahci driver uses it,
 xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers
 in seabios can do the same.
 
Isn't this approach broken? How can SeaBIOS be sure it restores real
mode registers to exactly same state they were before entering 32bit
mode?

 But as hpa mentioned it will be more tricky for option roms (aka
 virtio-net).
 
 cheers,
   Gerd
 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread H. Peter Anvin
On 06/06/2013 08:10 AM, Gleb Natapov wrote:
 On Thu, Jun 06, 2013 at 05:06:32PM +0200, Gerd Hoffmann wrote:

 Isn't this approach broken? How can SeaBIOS be sure it restores real
 mode registers to exactly same state they were before entering 32bit
 mode?
 

It can't... so yes, it is broken.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Gerd Hoffmann
  Hi,

 For seabios itself this isn't a big issue, see pci_{readl,writel} in
 src/pci.c.  When called in 16bit mode it goes into 32bit mode
 temporarily, just for accessing the mmio register.  ahci driver uses it,
 xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers
 in seabios can do the same.

 Isn't this approach broken? How can SeaBIOS be sure it restores real
 mode registers to exactly same state they were before entering 32bit
 mode?

Don't know the details of that magic.  Kevin had some concerns on the
stability of this, so maybe there is a theoretic hole.  So far I havn't
seen any issues in practice, but also didn't stress it too much.
Basically only used that with all kinds of boot loaders, could very well
be it breaks if you try to use that with more esoteric stuff such as dos
extenders, then hit unhandled corner cases ...

cheers,
  Gerd


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Rusty Russell
Anthony Liguori aligu...@us.ibm.com writes:
 Hi Rusty,

 Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.

 Now you have a different compatibility problem; how do you know the
 guest supports the new virtio-pcie net?

 We don't care.

 We would still use virtio-pci for existing devices.  Only new devices
 would use virtio-pcie.

My concern is that new features make the virtio-pcie driver so desirable
that libvirt is pressured to use it ASAP.  Have we just punted the
problem upstream?

(Of course, feature bits were supposed to avoid such a transition issue,
but mistakes accumulate).

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 By my count, net still has 7 feature bits left, so I don't think the
 feature bits are likely to be a limitation in the next 6 months?

 Yes but you wanted a generic transport feature bit
 for flexible SG layout.
 Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?

If we need it within 6 months, I'd rather do that: this feature would
then be assumed for 1.0, and reserved.  We may do that for other
features, too (the committee will have to review).

 MMIO is a bigger problem.  Linux guests are happy with it: does it break
 the Windows drivers?
 
 Thanks,
 Rusty.

 You mean make BAR0 an MMIO BAR?
 Yes, it would break current windows guests.
 Further, as long as we use same address to notify all queues,
 we would also need to decode the instruction on x86 and that's
 measureably slower than PIO.
 We could go back to discussing hypercall use for notifications,
 but that has its own set of issues...

We might have something ready to deploy in 3 months, but realistically,
I'd rather have it ready and tested outside the main git tree(s) and
push it once the standard is committed.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 04:49:22PM +0930, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  By my count, net still has 7 feature bits left, so I don't think the
  feature bits are likely to be a limitation in the next 6 months?
 
  Yes but you wanted a generic transport feature bit
  for flexible SG layout.
  Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?
 
 If we need it within 6 months, I'd rather do that: this feature would
 then be assumed for 1.0, and reserved.  We may do that for other
 features, too (the committee will have to review).
 
  MMIO is a bigger problem.  Linux guests are happy with it: does it break
  the Windows drivers?
  
  Thanks,
  Rusty.
 
  You mean make BAR0 an MMIO BAR?
  Yes, it would break current windows guests.
  Further, as long as we use same address to notify all queues,
  we would also need to decode the instruction on x86 and that's
  measureably slower than PIO.
  We could go back to discussing hypercall use for notifications,
  but that has its own set of issues...
 
 We might have something ready to deploy in 3 months,
 but realistically,
 I'd rather have it ready and tested outside the main git tree(s) and
 push it once the standard is committed.
 
 Cheers,
 Rusty.


We have working code already. We don't need 3 months out of tree,
this gets us no progress. To make it ready to deploy
we need it upstream and people testing it.

I think the issue is the layout change, you don't want
the new config layout before we get standartization rolling, right?
Okay how about this small patch to the linux guest
(completely untested, just to give you the idea):


diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index a7ce730..0e34862 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -675,6 +675,33 @@ static void virtio_pci_release_dev(struct device *_d)
 */
 }
 
+/* Map a BAR. But carefully: make sure we don't overlap the MSI-X table */
+static void __iomem * virtio_pci_iomap(struct pci_dev *pci_dev, int bar)
+{
+   int msix_cap = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
+   if (msix_cap) {
+   u32 offset;
+   u8 bir;
+   pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_TABLE,
+ offset);
+   bir = (u8)(offset  PCI_MSIX_TABLE_BIR);
+   offset = PCI_MSIX_TABLE_OFFSET;
+   /* Spec says table offset is in a 4K page all by itself */
+   if (bir == bar  offset  4096)
+   return NULL;
+
+   pci_read_config_dword(pci_dev, msix_cap + PCI_MSIX_PBA,
+ offset);
+   bir = (u8)(offset  PCI_MSIX_PBA_BIR);
+   offset = PCI_MSIX_PBA_OFFSET;
+   /* Spec says table offset is in a 4K page all by itself. */
+   if (bir == bar  offset  4096)
+   return NULL;
+   }
+   /* 4K is enough for all devices at the moment. */
+   return pci_iomap(pci_dev, 0, 4096);
+}
+
 /* the PCI probing function */
 static int virtio_pci_probe(struct pci_dev *pci_dev,
const struct pci_device_id *id)
@@ -716,7 +743,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
if (err)
goto out_enable_device;
 
-   vp_dev-ioaddr = pci_iomap(pci_dev, 0, 0);
+   vp_dev-ioaddr = virtio_pci_iomap(pci_dev, 0);
+   /* Failed to map BAR0? Try with BAR1. */
+   if (vp_dev-ioaddr == NULL) {
+   vp_dev-ioaddr = virtio_pci_iomap(pci_dev, 1);
if (vp_dev-ioaddr == NULL) {
err = -ENOMEM;
goto out_req_regions;


In other words: put a copy of IO config at start of MMIO BAR1,
making sure we don't overlap the MSIX table that's there.

This does not break windows guests, and makes us compliant to the
PCI express spec.

Is this small enough to start going immediately, without waiting
3+ months?

I really think it's a good idea to put something like this
in the field: we might discover more issues around MMIO
and we'll address them in the new config layout.
This way we don't get two changes: new layout and switch to MMIO
all at once.

If you like this, I can make the appropriate spec and qemu changes in a
couple of days.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
 You mean make BAR0 an MMIO BAR?
 Yes, it would break current windows guests.
 Further, as long as we use same address to notify all queues,
 we would also need to decode the instruction on x86 and that's
 measureably slower than PIO.
 We could go back to discussing hypercall use for notifications,
 but that has its own set of issues...

So... does violating the PCI-e spec really matter?  Is it preventing
any guest from working properly?

I don't think we should rush an ABI breakage if the only benefit is
claiming spec compliance.

Regards,

Anthony Liguori


 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
  You mean make BAR0 an MMIO BAR?
  Yes, it would break current windows guests.
  Further, as long as we use same address to notify all queues,
  we would also need to decode the instruction on x86 and that's
  measureably slower than PIO.
  We could go back to discussing hypercall use for notifications,
  but that has its own set of issues...
 
 So... does violating the PCI-e spec really matter?  Is it preventing
 any guest from working properly?

Yes, absolutely, this wording in spec is not there without reason.

Existing guests allocate io space for PCI express ports in
multiples on 4K.

Since each express device is behind such a port, this means
at most 15 such devices can use IO ports in a system.

That's why to make a pci express virtio device,
we must allow MMIO and/or some other communication
mechanism as the spec requires.

That's on x86.

Besides x86, there are achitectures where IO is unavailable or very slow.

 I don't think we should rush an ABI breakage if the only benefit is
 claiming spec compliance.
 
 Regards,
 
 Anthony Liguori

Why do you bring this up? No one advocates any ABI breakage,
I only suggest extensions.


 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
  You mean make BAR0 an MMIO BAR?
  Yes, it would break current windows guests.
  Further, as long as we use same address to notify all queues,
  we would also need to decode the instruction on x86 and that's
  measureably slower than PIO.
  We could go back to discussing hypercall use for notifications,
  but that has its own set of issues...
 
 So... does violating the PCI-e spec really matter?  Is it preventing
 any guest from working properly?

 Yes, absolutely, this wording in spec is not there without reason.

 Existing guests allocate io space for PCI express ports in
 multiples on 4K.

 Since each express device is behind such a port, this means
 at most 15 such devices can use IO ports in a system.

 That's why to make a pci express virtio device,
 we must allow MMIO and/or some other communication
 mechanism as the spec requires.

This is precisely why this is an ABI breaker.

If you disable IO bars in the BIOS, than the interface that the OS sees
will *not have an IO bar*.

This *breaks existing guests*.

Any time the programming interfaces changes on a PCI device, the
revision ID and/or device ID must change.  The spec is very clear about
this.

We cannot disable the IO BAR without changing revision ID/device ID.

 That's on x86.

 Besides x86, there are achitectures where IO is unavailable or very slow.

 I don't think we should rush an ABI breakage if the only benefit is
 claiming spec compliance.
 
 Regards,
 
 Anthony Liguori

 Why do you bring this up? No one advocates any ABI breakage,
 I only suggest extensions.

It's an ABI breakage.  You're claiming that the guests you tested
handle the breakage reasonably but it is unquestionably an ABI breakage.

Regards,

Anthony Liguori



 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
   You mean make BAR0 an MMIO BAR?
   Yes, it would break current windows guests.
   Further, as long as we use same address to notify all queues,
   we would also need to decode the instruction on x86 and that's
   measureably slower than PIO.
   We could go back to discussing hypercall use for notifications,
   but that has its own set of issues...
  
  So... does violating the PCI-e spec really matter?  Is it preventing
  any guest from working properly?
 
  Yes, absolutely, this wording in spec is not there without reason.
 
  Existing guests allocate io space for PCI express ports in
  multiples on 4K.
 
  Since each express device is behind such a port, this means
  at most 15 such devices can use IO ports in a system.
 
  That's why to make a pci express virtio device,
  we must allow MMIO and/or some other communication
  mechanism as the spec requires.
 
 This is precisely why this is an ABI breaker.
 
 If you disable IO bars in the BIOS, than the interface that the OS sees
 will *not have an IO bar*.
 
 This *breaks existing guests*.
 Any time the programming interfaces changes on a PCI device, the
 revision ID and/or device ID must change.  The spec is very clear about
 this.
 
 We cannot disable the IO BAR without changing revision ID/device ID.
 

But it's a bios/PC issue. It's not a device issue.

Anyway, let's put express aside.

It's easy to create non-working setups with pci, today:

- create 16 pci bridges
- put one virtio device behind each

boom

Try it.

I want to fix that.


  That's on x86.
 
  Besides x86, there are achitectures where IO is unavailable or very slow.
 
  I don't think we should rush an ABI breakage if the only benefit is
  claiming spec compliance.
  
  Regards,
  
  Anthony Liguori
 
  Why do you bring this up? No one advocates any ABI breakage,
  I only suggest extensions.
 
 It's an ABI breakage.  You're claiming that the guests you tested
 handle the breakage reasonably but it is unquestionably an ABI breakage.
 
 Regards,
 
 Anthony Liguori

Adding BAR is not an ABI breakage, do we agree on that?

Disabling IO would be but I am not proposing disabling IO.

Guests might disable IO.

I propose a way to make virtio still work if they do.

This is *fixing* things. Not breaking.

 
 
  
   -- 
   MST
   --
   To unsubscribe from this list: send the line unsubscribe kvm in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
   You mean make BAR0 an MMIO BAR?
   Yes, it would break current windows guests.
   Further, as long as we use same address to notify all queues,
   we would also need to decode the instruction on x86 and that's
   measureably slower than PIO.
   We could go back to discussing hypercall use for notifications,
   but that has its own set of issues...
  
  So... does violating the PCI-e spec really matter?  Is it preventing
  any guest from working properly?
 
  Yes, absolutely, this wording in spec is not there without reason.
 
  Existing guests allocate io space for PCI express ports in
  multiples on 4K.
 
  Since each express device is behind such a port, this means
  at most 15 such devices can use IO ports in a system.
 
  That's why to make a pci express virtio device,
  we must allow MMIO and/or some other communication
  mechanism as the spec requires.
 
 This is precisely why this is an ABI breaker.
 
 If you disable IO bars in the BIOS, than the interface that the OS sees
 will *not have an IO bar*.
 
 This *breaks existing guests*.
 Any time the programming interfaces changes on a PCI device, the
 revision ID and/or device ID must change.  The spec is very clear about
 this.
 
 We cannot disable the IO BAR without changing revision ID/device ID.
 

 But it's a bios/PC issue. It's not a device issue.

 Anyway, let's put express aside.

 It's easy to create non-working setups with pci, today:

 - create 16 pci bridges
 - put one virtio device behind each

 boom

 Try it.

 I want to fix that.


  That's on x86.
 
  Besides x86, there are achitectures where IO is unavailable or very slow.
 
  I don't think we should rush an ABI breakage if the only benefit is
  claiming spec compliance.
  
  Regards,
  
  Anthony Liguori
 
  Why do you bring this up? No one advocates any ABI breakage,
  I only suggest extensions.
 
 It's an ABI breakage.  You're claiming that the guests you tested
 handle the breakage reasonably but it is unquestionably an ABI breakage.
 
 Regards,
 
 Anthony Liguori

 Adding BAR is not an ABI breakage, do we agree on that?

 Disabling IO would be but I am not proposing disabling IO.

 Guests might disable IO.

Look, it's very simple.

If the failure in the guest is that BAR0 mapping fails because the
device is enabled but the BAR is disabled, then you've broken the ABI.

And what's worse is that this isn't for an obscure scenario (like having
15 PCI bridges) but for something that would become the standard
scenario (using a PCI-e bus).

We need to either bump the revision ID or the device ID if we do this.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
   Michael S. Tsirkin m...@redhat.com writes:
   
On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
You mean make BAR0 an MMIO BAR?
Yes, it would break current windows guests.
Further, as long as we use same address to notify all queues,
we would also need to decode the instruction on x86 and that's
measureably slower than PIO.
We could go back to discussing hypercall use for notifications,
but that has its own set of issues...
   
   So... does violating the PCI-e spec really matter?  Is it preventing
   any guest from working properly?
  
   Yes, absolutely, this wording in spec is not there without reason.
  
   Existing guests allocate io space for PCI express ports in
   multiples on 4K.
  
   Since each express device is behind such a port, this means
   at most 15 such devices can use IO ports in a system.
  
   That's why to make a pci express virtio device,
   we must allow MMIO and/or some other communication
   mechanism as the spec requires.
  
  This is precisely why this is an ABI breaker.
  
  If you disable IO bars in the BIOS, than the interface that the OS sees
  will *not have an IO bar*.
  
  This *breaks existing guests*.
  Any time the programming interfaces changes on a PCI device, the
  revision ID and/or device ID must change.  The spec is very clear about
  this.
  
  We cannot disable the IO BAR without changing revision ID/device ID.
  
 
  But it's a bios/PC issue. It's not a device issue.
 
  Anyway, let's put express aside.
 
  It's easy to create non-working setups with pci, today:
 
  - create 16 pci bridges
  - put one virtio device behind each
 
  boom
 
  Try it.
 
  I want to fix that.
 
 
   That's on x86.
  
   Besides x86, there are achitectures where IO is unavailable or very slow.
  
   I don't think we should rush an ABI breakage if the only benefit is
   claiming spec compliance.
   
   Regards,
   
   Anthony Liguori
  
   Why do you bring this up? No one advocates any ABI breakage,
   I only suggest extensions.
  
  It's an ABI breakage.  You're claiming that the guests you tested
  handle the breakage reasonably but it is unquestionably an ABI breakage.
  
  Regards,
  
  Anthony Liguori
 
  Adding BAR is not an ABI breakage, do we agree on that?
 
  Disabling IO would be but I am not proposing disabling IO.
 
  Guests might disable IO.
 
 Look, it's very simple.
 
 If the failure in the guest is that BAR0 mapping fails because the
 device is enabled but the BAR is disabled,

There's no such thing as device is enabled and neither
BAR is disabled in PCI spec.

Spec says IO and memory can be enabled/disabled, separately.
PCI Express spec says devices should work without IO.

So modern guests will assume it's ok to work without IO,
and it will become more common in the future.

 then you've broken the ABI.


No.  It means that the ABI is broken.

Guests can disable IO *today* and when they do things don't work.

 
 And what's worse is that this isn't for an obscure scenario (like having
 15 PCI bridges) but for something that would become the standard
 scenario (using a PCI-e bus).
 
 We need to either bump the revision ID or the device ID if we do this.
 
 Regards,
 
 Anthony Liguori

We only need to do it if we do a change that breaks guests.

Please find a guest that is broken by the patches. You won't find any.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
 Look, it's very simple.
 We only need to do it if we do a change that breaks guests.

 Please find a guest that is broken by the patches. You won't find any.

I think the problem in this whole discussion is that we're talking past
each other.

Here is my understanding:

1) PCI-e says that you must be able to disable IO bars and still have a
functioning device.

2) It says (1) because you must size IO bars to 4096 which means that
practically speaking, once you enable a dozen or so PIO bars, you run
out of PIO space (16 * 4k == 64k and not all that space can be used).

virtio-pci uses a IO bars exclusively today.  Existing guest drivers
assume that there is an IO bar that contains the virtio-pci registers.

So let's consider the following scenarios:

QEMU of today:

1) qemu -drive file=ubuntu-13.04.img,if=virtio

This works today.  Does adding an MMIO bar at BAR1 break this?
Certainly not if the device is behind a PCI bus...

But are we going to put devices behind a PCI-e bus by default?  Are we
going to ask the user to choose whether devices are put behind a legacy
bus or the express bus?

What happens if we put the device behind a PCI-e bus by default?  Well,
it can still work.  That is, until we do something like this:

2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
-device virtio-balloon..

Such that we have more than a dozen or so devices.  This works
perfectly fine today.  It works fine because we've designed virtio to
make sure it works fine.  Quoting the spec:

Configuration space is generally used for rarely-changing or
 initialization-time parameters. But it is a limited resource, so it
 might be better to use a virtqueue to update configuration information
 (the network device does this for filtering, otherwise the table in the
 config space could potentially be very large).

In fact, we can have 100s of PCI devices today without running out of IO
space because we're so careful about this.

So if we switch to using PCI-e by default *and* we keep virtio-pci
without modifying the device IDs, then very frequently we are going to
break existing guests because the drivers they already have no longer
work.

A few virtio-serial channels, a few block devices, a couple of network
adapters, the balloon and RNG driver, and we hit the IO space limit
pretty damn quickly so this is not a contrived scenario at all.  I would
expect that we frequently run into this if we don't address this problem.

So we have a few options:

1) Punt all of this complexity to libvirt et al and watch people make
   the wrong decisions about when to use PCI-e.  This will become yet
   another example of KVM being too hard to configure.

2) Enable PCI-e by default and just force people to upgrade their
   drivers.

3) Don't use PCI-e by default but still add BAR1 to virtio-pci

4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
   it a new device/vendor ID.   Continue to use virtio-pci for existing
   devices potentially adding virtio-{net,blk,...}-pcie variants for
   people that care to use them.

I think 1 == 2 == 3 and I view 2 as an ABI breaker.  libvirt does like
policy so they're going to make a simple decision and always use the
same bus by default.  I suspect if we made PCI the default, they might
just always set the PCI-e flag just because.

There are hundreds of thousands if not millions of guests with existing
virtio-pci drivers.  Forcing them to upgrade better have an extremely
good justification.

I think 4 is the best path forward.  It's better for users (guests
continue to work as they always have).  There's less confusion about
enabling PCI-e support--you must ask for the virtio-pcie variant and you
must have a virtio-pcie driver.  It's easy to explain.

It also maps to what regular hardware does.  I highly doubt that there
are any real PCI cards that made the shift from PCI to PCI-e without
bumping at least a revision ID.

It also means we don't need to play games about sometimes enabling IO
bars and sometimes not.

Regards,

Anthony Liguori



 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
  Look, it's very simple.
  We only need to do it if we do a change that breaks guests.
 
  Please find a guest that is broken by the patches. You won't find any.
 
 I think the problem in this whole discussion is that we're talking past
 each other.
 
 Here is my understanding:
 
 1) PCI-e says that you must be able to disable IO bars and still have a
 functioning device.
 
 2) It says (1) because you must size IO bars to 4096 which means that
 practically speaking, once you enable a dozen or so PIO bars, you run
 out of PIO space (16 * 4k == 64k and not all that space can be used).


Let me add 3 other issues which I mentioned and you seem to miss:

3) architectures which don't have fast access to IO ports, exist
   virtio does not work there ATM

4) setups with many PCI bridges exist and have the same issue
   as PCI express. virtio does not work there ATM

5) On x86, even with nested page tables, firmware only decodes
   the page address on an invalid PTE, not the data. You need to
   emulate the guest to get at the data. Without
   nested page tables, we have to do page table walk and emulate
   to get both address and data. Since this is how MMIO
   is implemented in kvm on x86, MMIO is much slower than PIO
   (with nested page tables by a factor of 2, did not test without).

 virtio-pci uses a IO bars exclusively today.  Existing guest drivers
 assume that there is an IO bar that contains the virtio-pci registers.
 So let's consider the following scenarios:
 
 QEMU of today:
 
 1) qemu -drive file=ubuntu-13.04.img,if=virtio
 
 This works today.  Does adding an MMIO bar at BAR1 break this?
 Certainly not if the device is behind a PCI bus...
 
 But are we going to put devices behind a PCI-e bus by default?  Are we
 going to ask the user to choose whether devices are put behind a legacy
 bus or the express bus?
 
 What happens if we put the device behind a PCI-e bus by default?  Well,
 it can still work.  That is, until we do something like this:
 
 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
 -device virtio-balloon..
 
 Such that we have more than a dozen or so devices.  This works
 perfectly fine today.  It works fine because we've designed virtio to
 make sure it works fine.  Quoting the spec:
 
 Configuration space is generally used for rarely-changing or
  initialization-time parameters. But it is a limited resource, so it
  might be better to use a virtqueue to update configuration information
  (the network device does this for filtering, otherwise the table in the
  config space could potentially be very large).
 
 In fact, we can have 100s of PCI devices today without running out of IO
 space because we're so careful about this.
 
 So if we switch to using PCI-e by default *and* we keep virtio-pci
 without modifying the device IDs, then very frequently we are going to
 break existing guests because the drivers they already have no longer
 work.
 
 A few virtio-serial channels, a few block devices, a couple of network
 adapters, the balloon and RNG driver, and we hit the IO space limit
 pretty damn quickly so this is not a contrived scenario at all.  I would
 expect that we frequently run into this if we don't address this problem.
 
 So we have a few options:
 1) Punt all of this complexity to libvirt et al and watch people make
the wrong decisions about when to use PCI-e.  This will become yet
another example of KVM being too hard to configure.
 
 2) Enable PCI-e by default and just force people to upgrade their
drivers.
 
 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
 
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),

We can't do this - it will hurt performance.

give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.
 
 I think 1 == 2 == 3 and I view 2 as an ABI breaker.

Why do you think 2 == 3? 2 changes default behaviour. 3 does not.

 libvirt does like
 policy so they're going to make a simple decision and always use the
 same bus by default.  I suspect if we made PCI the default, they might
 just always set the PCI-e flag just because.

This sounds very strange. But let's assume you are right for
the sake of the argument ...

 There are hundreds of thousands if not millions of guests with existing
 virtio-pci drivers.  Forcing them to upgrade better have an extremely
 good justification.
 
 I think 4 is the best path forward.  It's better for users (guests
 continue to work as they always have).  There's less confusion about
 enabling PCI-e support--you must ask for the virtio-pcie variant and you
 must have a virtio-pcie driver.  It's easy to explain.

I don't think how this changes the situation. 

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
   Look, it's very simple.
   We only need to do it if we do a change that breaks guests.
  
   Please find a guest that is broken by the patches. You won't find any.
  
  I think the problem in this whole discussion is that we're talking past
  each other.
  
  Here is my understanding:
  
  1) PCI-e says that you must be able to disable IO bars and still have a
  functioning device.
  
  2) It says (1) because you must size IO bars to 4096 which means that
  practically speaking, once you enable a dozen or so PIO bars, you run
  out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 
 Let me add 3 other issues which I mentioned and you seem to miss:
 
 3) architectures which don't have fast access to IO ports, exist
virtio does not work there ATM
 
 4) setups with many PCI bridges exist and have the same issue
as PCI express. virtio does not work there ATM
 
 5) On x86, even with nested page tables, firmware only decodes
the page address on an invalid PTE, not the data. You need to
emulate the guest to get at the data. Without
nested page tables, we have to do page table walk and emulate
to get both address and data. Since this is how MMIO
is implemented in kvm on x86, MMIO is much slower than PIO
(with nested page tables by a factor of 2, did not test without).

Oh I forgot:

6) access to MMIO BARs is painful in the BIOS environment
   so BIOS would typically need to enable IO for the boot device.



  virtio-pci uses a IO bars exclusively today.  Existing guest drivers
  assume that there is an IO bar that contains the virtio-pci registers.
  So let's consider the following scenarios:
  
  QEMU of today:
  
  1) qemu -drive file=ubuntu-13.04.img,if=virtio
  
  This works today.  Does adding an MMIO bar at BAR1 break this?
  Certainly not if the device is behind a PCI bus...
  
  But are we going to put devices behind a PCI-e bus by default?  Are we
  going to ask the user to choose whether devices are put behind a legacy
  bus or the express bus?
  
  What happens if we put the device behind a PCI-e bus by default?  Well,
  it can still work.  That is, until we do something like this:
  
  2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
  -device virtio-balloon..
  
  Such that we have more than a dozen or so devices.  This works
  perfectly fine today.  It works fine because we've designed virtio to
  make sure it works fine.  Quoting the spec:
  
  Configuration space is generally used for rarely-changing or
   initialization-time parameters. But it is a limited resource, so it
   might be better to use a virtqueue to update configuration information
   (the network device does this for filtering, otherwise the table in the
   config space could potentially be very large).
  
  In fact, we can have 100s of PCI devices today without running out of IO
  space because we're so careful about this.
  
  So if we switch to using PCI-e by default *and* we keep virtio-pci
  without modifying the device IDs, then very frequently we are going to
  break existing guests because the drivers they already have no longer
  work.
  
  A few virtio-serial channels, a few block devices, a couple of network
  adapters, the balloon and RNG driver, and we hit the IO space limit
  pretty damn quickly so this is not a contrived scenario at all.  I would
  expect that we frequently run into this if we don't address this problem.
  
  So we have a few options:
  1) Punt all of this complexity to libvirt et al and watch people make
 the wrong decisions about when to use PCI-e.  This will become yet
 another example of KVM being too hard to configure.
  
  2) Enable PCI-e by default and just force people to upgrade their
 drivers.
  
  3) Don't use PCI-e by default but still add BAR1 to virtio-pci
  
  4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
 
 We can't do this - it will hurt performance.
 
 give
 it a new device/vendor ID.   Continue to use virtio-pci for existing
 devices potentially adding virtio-{net,blk,...}-pcie variants for
 people that care to use them.
  
  I think 1 == 2 == 3 and I view 2 as an ABI breaker.
 
 Why do you think 2 == 3? 2 changes default behaviour. 3 does not.
 
  libvirt does like
  policy so they're going to make a simple decision and always use the
  same bus by default.  I suspect if we made PCI the default, they might
  just always set the PCI-e flag just because.
 
 This sounds very strange. But let's assume you are right for
 the sake of the argument ...
 
  There are hundreds of thousands if not millions of guests with existing
  virtio-pci drivers.  Forcing them to upgrade better have an extremely
  good 

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
  Look, it's very simple.
  We only need to do it if we do a change that breaks guests.
 
  Please find a guest that is broken by the patches. You won't find any.
 
 I think the problem in this whole discussion is that we're talking past
 each other.
 
 Here is my understanding:
 
 1) PCI-e says that you must be able to disable IO bars and still have a
 functioning device.
 
 2) It says (1) because you must size IO bars to 4096 which means that
 practically speaking, once you enable a dozen or so PIO bars, you run
 out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 virtio-pci uses a IO bars exclusively today.  Existing guest drivers
 assume that there is an IO bar that contains the virtio-pci registers.
 
 So let's consider the following scenarios:
 
 QEMU of today:
 
 1) qemu -drive file=ubuntu-13.04.img,if=virtio
 
 This works today.  Does adding an MMIO bar at BAR1 break this?
 Certainly not if the device is behind a PCI bus...
 
 But are we going to put devices behind a PCI-e bus by default?  Are we
 going to ask the user to choose whether devices are put behind a legacy
 bus or the express bus?
 
 What happens if we put the device behind a PCI-e bus by default?  Well,
 it can still work.  That is, until we do something like this:
 
 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
 -device virtio-balloon..
 
 Such that we have more than a dozen or so devices.  This works
 perfectly fine today.  It works fine because we've designed virtio to
 make sure it works fine.  Quoting the spec:
 
 Configuration space is generally used for rarely-changing or
  initialization-time parameters. But it is a limited resource, so it
  might be better to use a virtqueue to update configuration information
  (the network device does this for filtering, otherwise the table in the
  config space could potentially be very large).
 
 In fact, we can have 100s of PCI devices today without running out of IO
 space because we're so careful about this.
 
 So if we switch to using PCI-e by default *and* we keep virtio-pci
 without modifying the device IDs, then very frequently we are going to
 break existing guests because the drivers they already have no longer
 work.
 
 A few virtio-serial channels, a few block devices, a couple of network
 adapters, the balloon and RNG driver, and we hit the IO space limit
 pretty damn quickly so this is not a contrived scenario at all.  I would
 expect that we frequently run into this if we don't address this problem.
 
 So we have a few options:
 
 1) Punt all of this complexity to libvirt et al and watch people make
the wrong decisions about when to use PCI-e.  This will become yet
another example of KVM being too hard to configure.
 
 2) Enable PCI-e by default and just force people to upgrade their
drivers.
 
 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
 
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.

For the record, with respect to PCI-e discussion, I have no
problem with the idea of changing the device ID or
revision id and asking guests to upgrade if they
want to use a pcie device.

That's not exactly 4 however.


I see no reason to couple PCI-e with MMIO discussion,
that's just one of the reasons to support MMIO.


 I think 1 == 2 == 3 and I view 2 as an ABI breaker.  libvirt does like
 policy so they're going to make a simple decision and always use the
 same bus by default.  I suspect if we made PCI the default, they might
 just always set the PCI-e flag just because.
 
 There are hundreds of thousands if not millions of guests with existing
 virtio-pci drivers.  Forcing them to upgrade better have an extremely
 good justification.
 
 I think 4 is the best path forward.  It's better for users (guests
 continue to work as they always have).  There's less confusion about
 enabling PCI-e support--you must ask for the virtio-pcie variant and you
 must have a virtio-pcie driver.  It's easy to explain.
 
 It also maps to what regular hardware does.  I highly doubt that there
 are any real PCI cards that made the shift from PCI to PCI-e without
 bumping at least a revision ID.
 
 It also means we don't need to play games about sometimes enabling IO
 bars and sometimes not.
 
 Regards,
 
 Anthony Liguori
 
 
 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
  Look, it's very simple.
  We only need to do it if we do a change that breaks guests.
 
  Please find a guest that is broken by the patches. You won't find any.
 
 I think the problem in this whole discussion is that we're talking past
 each other.
 
 Here is my understanding:
 
 1) PCI-e says that you must be able to disable IO bars and still have a
 functioning device.
 
 2) It says (1) because you must size IO bars to 4096 which means that
 practically speaking, once you enable a dozen or so PIO bars, you run
 out of PIO space (16 * 4k == 64k and not all that space can be used).


 Let me add 3 other issues which I mentioned and you seem to miss:

 3) architectures which don't have fast access to IO ports, exist
virtio does not work there ATM

Which architectures have PCI but no IO ports?

 4) setups with many PCI bridges exist and have the same issue
as PCI express. virtio does not work there ATM

This is not virtio specific.  This is true for all devices that use IO.

 5) On x86, even with nested page tables, firmware only decodes
the page address on an invalid PTE, not the data. You need to
emulate the guest to get at the data. Without
nested page tables, we have to do page table walk and emulate
to get both address and data. Since this is how MMIO
is implemented in kvm on x86, MMIO is much slower than PIO
(with nested page tables by a factor of 2, did not test without).

Am well aware of this, this is why we use PIO.

I fully agree with you that when we do MMIO, we should switch the
notification mechanism to avoid encoding anything meaningful as data.

 virtio-pci uses a IO bars exclusively today.  Existing guest drivers
 assume that there is an IO bar that contains the virtio-pci registers.
 So let's consider the following scenarios:
 
 QEMU of today:
 
 1) qemu -drive file=ubuntu-13.04.img,if=virtio
 
 This works today.  Does adding an MMIO bar at BAR1 break this?
 Certainly not if the device is behind a PCI bus...
 
 But are we going to put devices behind a PCI-e bus by default?  Are we
 going to ask the user to choose whether devices are put behind a legacy
 bus or the express bus?
 
 What happens if we put the device behind a PCI-e bus by default?  Well,
 it can still work.  That is, until we do something like this:
 
 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
 -device virtio-balloon..
 
 Such that we have more than a dozen or so devices.  This works
 perfectly fine today.  It works fine because we've designed virtio to
 make sure it works fine.  Quoting the spec:
 
 Configuration space is generally used for rarely-changing or
  initialization-time parameters. But it is a limited resource, so it
  might be better to use a virtqueue to update configuration information
  (the network device does this for filtering, otherwise the table in the
  config space could potentially be very large).
 
 In fact, we can have 100s of PCI devices today without running out of IO
 space because we're so careful about this.
 
 So if we switch to using PCI-e by default *and* we keep virtio-pci
 without modifying the device IDs, then very frequently we are going to
 break existing guests because the drivers they already have no longer
 work.
 
 A few virtio-serial channels, a few block devices, a couple of network
 adapters, the balloon and RNG driver, and we hit the IO space limit
 pretty damn quickly so this is not a contrived scenario at all.  I would
 expect that we frequently run into this if we don't address this problem.
 
 So we have a few options:
 1) Punt all of this complexity to libvirt et al and watch people make
the wrong decisions about when to use PCI-e.  This will become yet
another example of KVM being too hard to configure.
 
 2) Enable PCI-e by default and just force people to upgrade their
drivers.
 
 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
 
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),

 We can't do this - it will hurt performance.

Can you explain?  I thought the whole trick with separating out the
virtqueue notification register was to regain the performance?

give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.
 
 I think 1 == 2 == 3 and I view 2 as an ABI breaker.

 Why do you think 2 == 3? 2 changes default behaviour. 3 does not.

It doesn't change the default behavior but then we're pushing the
decision of when to use pci-e to the user.  They have to understand that
there can be subtle breakages because the virtio-pci driver may not work
if they are using an old guest.

 libvirt does like
 policy so they're 

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
   Look, it's very simple.
   We only need to do it if we do a change that breaks guests.
  
   Please find a guest that is broken by the patches. You won't find any.
  
  I think the problem in this whole discussion is that we're talking past
  each other.
  
  Here is my understanding:
  
  1) PCI-e says that you must be able to disable IO bars and still have a
  functioning device.
  
  2) It says (1) because you must size IO bars to 4096 which means that
  practically speaking, once you enable a dozen or so PIO bars, you run
  out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 
 Let me add 3 other issues which I mentioned and you seem to miss:
 
 3) architectures which don't have fast access to IO ports, exist
virtio does not work there ATM
 
 4) setups with many PCI bridges exist and have the same issue
as PCI express. virtio does not work there ATM
 
 5) On x86, even with nested page tables, firmware only decodes
the page address on an invalid PTE, not the data. You need to
emulate the guest to get at the data. Without
nested page tables, we have to do page table walk and emulate
to get both address and data. Since this is how MMIO
is implemented in kvm on x86, MMIO is much slower than PIO
(with nested page tables by a factor of 2, did not test without).

 Oh I forgot:

 6) access to MMIO BARs is painful in the BIOS environment
so BIOS would typically need to enable IO for the boot device.

But if you want to boot from the 16th device, the BIOS needs to solve
this problem anyway.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread H. Peter Anvin
On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
 
 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.
 

For native endpoints.  Currently virtio would be a legacy endpoint
which is quite correct -- it is compatible with a legacy interface.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
   Look, it's very simple.
   We only need to do it if we do a change that breaks guests.
  
   Please find a guest that is broken by the patches. You won't find any.
  
  I think the problem in this whole discussion is that we're talking past
  each other.
  
  Here is my understanding:
  
  1) PCI-e says that you must be able to disable IO bars and still have a
  functioning device.
  
  2) It says (1) because you must size IO bars to 4096 which means that
  practically speaking, once you enable a dozen or so PIO bars, you run
  out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 
  Let me add 3 other issues which I mentioned and you seem to miss:
 
  3) architectures which don't have fast access to IO ports, exist
 virtio does not work there ATM
 
 Which architectures have PCI but no IO ports?
 
  4) setups with many PCI bridges exist and have the same issue
 as PCI express. virtio does not work there ATM
 
 This is not virtio specific.  This is true for all devices that use IO.

Absolutely. And you will find that modern devices make use of IO
ports optional.

  5) On x86, even with nested page tables, firmware only decodes
 the page address on an invalid PTE, not the data. You need to
 emulate the guest to get at the data. Without
 nested page tables, we have to do page table walk and emulate
 to get both address and data. Since this is how MMIO
 is implemented in kvm on x86, MMIO is much slower than PIO
 (with nested page tables by a factor of 2, did not test without).
 
 Am well aware of this, this is why we use PIO.
 
 I fully agree with you that when we do MMIO, we should switch the
 notification mechanism to avoid encoding anything meaningful as data.
 
  virtio-pci uses a IO bars exclusively today.  Existing guest drivers
  assume that there is an IO bar that contains the virtio-pci registers.
  So let's consider the following scenarios:
  
  QEMU of today:
  
  1) qemu -drive file=ubuntu-13.04.img,if=virtio
  
  This works today.  Does adding an MMIO bar at BAR1 break this?
  Certainly not if the device is behind a PCI bus...
  
  But are we going to put devices behind a PCI-e bus by default?  Are we
  going to ask the user to choose whether devices are put behind a legacy
  bus or the express bus?
  
  What happens if we put the device behind a PCI-e bus by default?  Well,
  it can still work.  That is, until we do something like this:
  
  2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
  -device virtio-balloon..
  
  Such that we have more than a dozen or so devices.  This works
  perfectly fine today.  It works fine because we've designed virtio to
  make sure it works fine.  Quoting the spec:
  
  Configuration space is generally used for rarely-changing or
   initialization-time parameters. But it is a limited resource, so it
   might be better to use a virtqueue to update configuration information
   (the network device does this for filtering, otherwise the table in the
   config space could potentially be very large).
  
  In fact, we can have 100s of PCI devices today without running out of IO
  space because we're so careful about this.
  
  So if we switch to using PCI-e by default *and* we keep virtio-pci
  without modifying the device IDs, then very frequently we are going to
  break existing guests because the drivers they already have no longer
  work.
  
  A few virtio-serial channels, a few block devices, a couple of network
  adapters, the balloon and RNG driver, and we hit the IO space limit
  pretty damn quickly so this is not a contrived scenario at all.  I would
  expect that we frequently run into this if we don't address this problem.
  
  So we have a few options:
  1) Punt all of this complexity to libvirt et al and watch people make
 the wrong decisions about when to use PCI-e.  This will become yet
 another example of KVM being too hard to configure.
  
  2) Enable PCI-e by default and just force people to upgrade their
 drivers.
  
  3) Don't use PCI-e by default but still add BAR1 to virtio-pci
  
  4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
 
  We can't do this - it will hurt performance.
 
 Can you explain?  I thought the whole trick with separating out the
 virtqueue notification register was to regain the performance?

Yes but this trick only works well with NPT (it's still a bit
slower than PIO but not so drastically).
Without NPT you still need a page walk so it will be slow.

 give
 it a new device/vendor ID.   Continue to use virtio-pci for existing
 devices potentially adding virtio-{net,blk,...}-pcie variants for
 people that care 

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread H. Peter Anvin
On 06/05/2013 01:45 PM, Anthony Liguori wrote:
 
 But if you want to boot from the 16th device, the BIOS needs to solve
 this problem anyway.
 

No.  Just have the BIOS plumb the path to the device it wants to boot
from.  Problem solved.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 03:45:09PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
   Michael S. Tsirkin m...@redhat.com writes:
   
On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
Look, it's very simple.
We only need to do it if we do a change that breaks guests.
   
Please find a guest that is broken by the patches. You won't find any.
   
   I think the problem in this whole discussion is that we're talking past
   each other.
   
   Here is my understanding:
   
   1) PCI-e says that you must be able to disable IO bars and still have a
   functioning device.
   
   2) It says (1) because you must size IO bars to 4096 which means that
   practically speaking, once you enable a dozen or so PIO bars, you run
   out of PIO space (16 * 4k == 64k and not all that space can be used).
  
  
  Let me add 3 other issues which I mentioned and you seem to miss:
  
  3) architectures which don't have fast access to IO ports, exist
 virtio does not work there ATM
  
  4) setups with many PCI bridges exist and have the same issue
 as PCI express. virtio does not work there ATM
  
  5) On x86, even with nested page tables, firmware only decodes
 the page address on an invalid PTE, not the data. You need to
 emulate the guest to get at the data. Without
 nested page tables, we have to do page table walk and emulate
 to get both address and data. Since this is how MMIO
 is implemented in kvm on x86, MMIO is much slower than PIO
 (with nested page tables by a factor of 2, did not test without).
 
  Oh I forgot:
 
  6) access to MMIO BARs is painful in the BIOS environment
 so BIOS would typically need to enable IO for the boot device.
 
 But if you want to boot from the 16th device, the BIOS needs to solve
 this problem anyway.
 
 Regards,
 
 Anthony Liguori

But it's solvable: just enable devices one by one to boot from them.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Michael S. Tsirkin
On Wed, Jun 05, 2013 at 02:10:03PM -0700, H. Peter Anvin wrote:
 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
  
  Spec says IO and memory can be enabled/disabled, separately.
  PCI Express spec says devices should work without IO.
  
 
 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.
 
   -hpa

Yes. But it's not the spec wording that we care for most.
It's supporting setups that have trouble using IO.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
 
 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.
 

 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.

Do legacy endpoints also use 4k for BARs?

If not, can't we use a new device id for native endpoints and call it a
day?  Legacy endpoints would continue using the existing BAR layout.

Regards,

Anthony Liguori


   -hpa

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
 Can you explain?  I thought the whole trick with separating out the
 virtqueue notification register was to regain the performance?

 Yes but this trick only works well with NPT (it's still a bit
 slower than PIO but not so drastically).
 Without NPT you still need a page walk so it will be slow.

Do you mean NPT/EPT?

If your concern is shadow paging, then I think you're concerned about
hardware that is so slow to start with that it's not worth considering.

  It also maps to what regular hardware does.  I highly doubt that there
  are any real PCI cards that made the shift from PCI to PCI-e without
  bumping at least a revision ID.
 
  Only because the chance it's 100% compatible on the software level is 0.
  It always has some hardware specific quirks.
  No such excuse here.
 
  It also means we don't need to play games about sometimes enabling IO
  bars and sometimes not.
 
  This last paragraph is wrong, it ignores the issues 3) to 5) 
  I added above.
 
  If you do take them into account:
 - there are reasons to add MMIO BAR to PCI,
   even without PCI express
 
 So far, the only reason you've provided is it doesn't work on some
 architectures.  Which architectures?

 PowerPC wants this.

Existing PowerPC remaps PIO to MMAP so it works fine today.

Future platforms may not do this but future platforms can use a
different device.  They certainly won't be able to use the existing
drivers anyway.

Ben, am I wrong here?

 - we won't be able to drop IO BAR from virtio
 
 An IO BAR is useless if it means we can't have more than 12 devices.


 It's not useless. A smart BIOS can enable devices one by one as
 it tries to boot from them.

A smart BIOS can also use MMIO to program virtio.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread H. Peter Anvin
On 06/05/2013 02:50 PM, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:

 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.


 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.
 
 Do legacy endpoints also use 4k for BARs?

There are no 4K BARs.  In fact, I/O BARs are restricted by spec (there
is no technical enforcement, however) to 256 bytes.

The 4K come from the upstream bridge windows, which are only 4K granular
(historic stuff from when bridges were assumed rare.)  However, there
can be multiple devices, functions, and BARs inside that window.

The issue with PCIe is that each PCIe port is a bridge, so in reality
there is only one real device per bus number.

 If not, can't we use a new device id for native endpoints and call it a
 day?  Legacy endpoints would continue using the existing BAR layout.

Definitely an option.  However, we want to be able to boot from native
devices, too, so having an I/O BAR (which would not be used by the OS
driver) should still at the very least be an option.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 02:50 PM, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:

 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.


 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.
 
 Do legacy endpoints also use 4k for BARs?

 There are no 4K BARs.  In fact, I/O BARs are restricted by spec (there
 is no technical enforcement, however) to 256 bytes.

 The 4K come from the upstream bridge windows, which are only 4K granular
 (historic stuff from when bridges were assumed rare.)  However, there
 can be multiple devices, functions, and BARs inside that window.

Got it.


 The issue with PCIe is that each PCIe port is a bridge, so in reality
 there is only one real device per bus number.

 If not, can't we use a new device id for native endpoints and call it a
 day?  Legacy endpoints would continue using the existing BAR layout.

 Definitely an option.  However, we want to be able to boot from native
 devices, too, so having an I/O BAR (which would not be used by the OS
 driver) should still at the very least be an option.

What makes it so difficult to work with an MMIO bar for PCI-e?

With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
forward.  Is there something special about PCI-e here?

Regards,

Anthony Liguori


   -hpa

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Benjamin Herrenschmidt
On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote:

 Existing PowerPC remaps PIO to MMAP so it works fine today.
 
 Future platforms may not do this but future platforms can use a
 different device.  They certainly won't be able to use the existing
 drivers anyway.
 
 Ben, am I wrong here?

Well, we do remap PIO, though it's ugly and annoying so we'd rather
avoid it alltogether :-)

Our latest HW PHBs don't support PIO at all, so while we can still
support it in the virtual ones in the guest for ever, it's becoming
pretty clear that PIO is on the way out ...

- we won't be able to drop IO BAR from virtio
  
  An IO BAR is useless if it means we can't have more than 12 devices.
 
 
  It's not useless. A smart BIOS can enable devices one by one as
  it tries to boot from them.
 
 A smart BIOS can also use MMIO to program virtio.

Indeed :-)

I see no reason why not providing both access path though. Have the PIO
BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the
MMIO window which I'd be happy to favor on power.

We could even put somewhere in there a feature bit set by qemu to
indicate whether it thinks PIO or MMIO is faster on a given platform if
you really think that's worth it (I don't).

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote:

 A smart BIOS can also use MMIO to program virtio.

 Indeed :-)

 I see no reason why not providing both access path though. Have the PIO
 BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the
 MMIO window which I'd be happy to favor on power.

 We could even put somewhere in there a feature bit set by qemu to
 indicate whether it thinks PIO or MMIO is faster on a given platform if
 you really think that's worth it (I don't).

That's okay, but what I'm most concerned about is compatibility.

A virtio PCI device that's a native endpoint needs to have a different
device ID than one that is a legacy endpoint.  The current drivers
have no hope of working (well) with virtio PCI devices exposed as native
endpoints.

I don't care if the native PCI endpoint also has a PIO bar.  But it
seems silly (and confusing) to me to make that layout be the legacy
layout verses a straight mirror of the new layout if we're already
changing the device ID.

In addition, it doesn't seem at all necessary to have an MMIO bar to the
legacy device.  If the reason you want MMIO is to avoid using PIO, then
you break existing drivers because they assume PIO.  If you are breaking
existing drivers then you should change the device ID.

If strictly speaking it's just that MMIO is a bit faster, I'm not sure
that complexity is worth it without seeing performance numbers first.

Regards,

Anthony Liguori


 Cheers,
 Ben.


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread H. Peter Anvin
On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot from native
 devices, too, so having an I/O BAR (which would not be used by the OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
 forward.  Is there something special about PCI-e here?
 

It's not tracking allocation.  It is that accessing memory above 1 MiB
is incredibly painful in the BIOS environment, which basically means
MMIO is inaccessible.

-hpa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Benjamin Herrenschmidt
On Wed, 2013-06-05 at 17:53 -0500, Anthony Liguori wrote:
 If strictly speaking it's just that MMIO is a bit faster, I'm not sure
 that complexity is worth it without seeing performance numbers first.

You mean PIO. I agree with all your points here. The only thing I
saw as an option would be to add a PIO BAR that is an exact mirror
of the MMIO BAR if and only if:

 - PIO is indeed significantly faster on platforms we care about

and/or

 - There is significant simplification in things like BIOSes in
   using PIO over MMIO

I personally don't care and would chose to use MMIO always including in
firmware on ppc.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot from native
 devices, too, so having an I/O BAR (which would not be used by the OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
 forward.  Is there something special about PCI-e here?
 

 It's not tracking allocation.  It is that accessing memory above 1 MiB
 is incredibly painful in the BIOS environment, which basically means
 MMIO is inaccessible.

Oh, you mean in real mode.

SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
There are loads of ASSERT32FLAT()s in the code to make sure of this.

Regards,

Anthony Liguori


   -hpa


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Rusty Russell
Anthony Liguori aligu...@us.ibm.com writes:
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.

Now you have a different compatibility problem; how do you know the
guest supports the new virtio-pcie net?

If you put a virtio-pci card behind a PCI-e bridge today, it's not
compliant, but AFAICT it will Just Work.  (Modulo the 16-dev limit).

I've been assuming we'd avoid a flag day change; that devices would
look like existing virtio-pci with capabilities indicating the new
config layout.

 I think 4 is the best path forward.  It's better for users (guests
 continue to work as they always have).  There's less confusion about
 enabling PCI-e support--you must ask for the virtio-pcie variant and you
 must have a virtio-pcie driver.  It's easy to explain.

Removing both forward and backward compatibility is easy to explain, but
I think it'll be harder to deploy.  This is your area though, so perhaps
I'm wrong.

 It also maps to what regular hardware does.  I highly doubt that there
 are any real PCI cards that made the shift from PCI to PCI-e without
 bumping at least a revision ID.

Noone expected the new cards to Just Work with old OSes: a new machine
meant a new OS and new drivers.  Hardware vendors like that.

Since virtualization often involves legacy, our priorities might be
different.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-04 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
  Rusty Russell ru...@rustcorp.com.au writes:
  
   Anthony Liguori aligu...@us.ibm.com writes:
   Forcing a guest driver change is a really big
   deal and I see no reason to do that unless there's a compelling reason
   to.
  
   So we're stuck with the 1.0 config layout for a very long time.
  
   We definitely must not force a guest change.  The explicit aim of the
   standard is that legacy and 1.0 be backward compatible.  One
   deliverable is a document detailing how this is done (effectively a
   summary of changes between what we have and 1.0).
  
  If 2.0 is fully backwards compatible, great.  It seems like such a
  difference that that would be impossible but I need to investigate
  further.
  
  Regards,
  
  Anthony Liguori
 
  If you look at my patches you'll see how it works.
  Basically old guests use BAR0 new ones don't, so
  it's easy: BAR0 access means legacy guest.
  Only started testing but things seem to work
  fine with old guests so far.
 
  I think we need a spec, not just driver code.
 
  Rusty what's the plan? Want me to write it?
 
 We need both, of course, but the spec work will happen in the OASIS WG.
 A draft is good, but let's not commit anything to upstream QEMU until we
 get the spec finalized.  And that is proposed to be late this year.

 Well that would be quite sad really.
 
 This means we can't make virtio a spec compliant pci express device,
 and we can't add any more feature bits, so no
 flexible buffer optimizations for virtio net.

 There are probably more projects that will be blocked.

 So how about we keep extending legacy layout for a bit longer:
 - add a way to access device with MMIO
 - use feature bit 31 to signal 64 bit features
   (and shift device config accordingly)

By my count, net still has 7 feature bits left, so I don't think the
feature bits are likely to be a limitation in the next 6 months?

MMIO is a bigger problem.  Linux guests are happy with it: does it break
the Windows drivers?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-04 Thread Michael S. Tsirkin
On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
  Michael S. Tsirkin m...@redhat.com writes:
   On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
   Rusty Russell ru...@rustcorp.com.au writes:
   
Anthony Liguori aligu...@us.ibm.com writes:
Forcing a guest driver change is a really big
deal and I see no reason to do that unless there's a compelling 
reason
to.
   
So we're stuck with the 1.0 config layout for a very long time.
   
We definitely must not force a guest change.  The explicit aim of the
standard is that legacy and 1.0 be backward compatible.  One
deliverable is a document detailing how this is done (effectively a
summary of changes between what we have and 1.0).
   
   If 2.0 is fully backwards compatible, great.  It seems like such a
   difference that that would be impossible but I need to investigate
   further.
   
   Regards,
   
   Anthony Liguori
  
   If you look at my patches you'll see how it works.
   Basically old guests use BAR0 new ones don't, so
   it's easy: BAR0 access means legacy guest.
   Only started testing but things seem to work
   fine with old guests so far.
  
   I think we need a spec, not just driver code.
  
   Rusty what's the plan? Want me to write it?
  
  We need both, of course, but the spec work will happen in the OASIS WG.
  A draft is good, but let's not commit anything to upstream QEMU until we
  get the spec finalized.  And that is proposed to be late this year.
 
  Well that would be quite sad really.
  
  This means we can't make virtio a spec compliant pci express device,
  and we can't add any more feature bits, so no
  flexible buffer optimizations for virtio net.
 
  There are probably more projects that will be blocked.
 
  So how about we keep extending legacy layout for a bit longer:
  - add a way to access device with MMIO
  - use feature bit 31 to signal 64 bit features
(and shift device config accordingly)
 
 By my count, net still has 7 feature bits left, so I don't think the
 feature bits are likely to be a limitation in the next 6 months?

Yes but you wanted a generic transport feature bit
for flexible SG layout.
Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?

 MMIO is a bigger problem.  Linux guests are happy with it: does it break
 the Windows drivers?
 
 Thanks,
 Rusty.

You mean make BAR0 an MMIO BAR?
Yes, it would break current windows guests.
Further, as long as we use same address to notify all queues,
we would also need to decode the instruction on x86 and that's
measureably slower than PIO.
We could go back to discussing hypercall use for notifications,
but that has its own set of issues...

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-03 Thread Michael S. Tsirkin
On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
  Rusty Russell ru...@rustcorp.com.au writes:
  
   Anthony Liguori aligu...@us.ibm.com writes:
   Forcing a guest driver change is a really big
   deal and I see no reason to do that unless there's a compelling reason
   to.
  
   So we're stuck with the 1.0 config layout for a very long time.
  
   We definitely must not force a guest change.  The explicit aim of the
   standard is that legacy and 1.0 be backward compatible.  One
   deliverable is a document detailing how this is done (effectively a
   summary of changes between what we have and 1.0).
  
  If 2.0 is fully backwards compatible, great.  It seems like such a
  difference that that would be impossible but I need to investigate
  further.
  
  Regards,
  
  Anthony Liguori
 
  If you look at my patches you'll see how it works.
  Basically old guests use BAR0 new ones don't, so
  it's easy: BAR0 access means legacy guest.
  Only started testing but things seem to work
  fine with old guests so far.
 
  I think we need a spec, not just driver code.
 
  Rusty what's the plan? Want me to write it?
 
 We need both, of course, but the spec work will happen in the OASIS WG.
 A draft is good, but let's not commit anything to upstream QEMU until we
 get the spec finalized.  And that is proposed to be late this year.

Well that would be quite sad really.

This means we can't make virtio a spec compliant pci express device,
and we can't add any more feature bits, so no
flexible buffer optimizations for virtio net.

There are probably more projects that will be blocked.

So how about we keep extending legacy layout for a bit longer:
- add a way to access device with MMIO
- use feature bit 31 to signal 64 bit features
  (and shift device config accordingly)

No endian-ness rework, no per queue enable etc.


Then when we start discussions we will have a working
express and working 64 bit feature support,
and it will be that much easier to make it pretty it.


 Since I'm going to have to reformat the spec and adapt it into OASIS
 style anyway, perhaps you should prepare a description as a standalone
 text document.  Easier to email and work with...
 
 Now, the idea is that if you want to support 0.9 and 1.0 (or whatever we
 call them; I used the term legacy for existing implementations in the
 OASIS WG proposal), you add capabilities and don't point them into (the
 start of?) BAR0.  Old drivers use BAR0 as now.
 
 One trick to note: while drivers shouldn't use both old and new style on
 the same device, you need to allow it for kexec, particularly reset via
 BAR0.
 
 Cheers,
 Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-02 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 
  Anthony Liguori aligu...@us.ibm.com writes:
  Forcing a guest driver change is a really big
  deal and I see no reason to do that unless there's a compelling reason
  to.
 
  So we're stuck with the 1.0 config layout for a very long time.
 
  We definitely must not force a guest change.  The explicit aim of the
  standard is that legacy and 1.0 be backward compatible.  One
  deliverable is a document detailing how this is done (effectively a
  summary of changes between what we have and 1.0).
 
 If 2.0 is fully backwards compatible, great.  It seems like such a
 difference that that would be impossible but I need to investigate
 further.
 
 Regards,
 
 Anthony Liguori

 If you look at my patches you'll see how it works.
 Basically old guests use BAR0 new ones don't, so
 it's easy: BAR0 access means legacy guest.
 Only started testing but things seem to work
 fine with old guests so far.

 I think we need a spec, not just driver code.

 Rusty what's the plan? Want me to write it?

We need both, of course, but the spec work will happen in the OASIS WG.
A draft is good, but let's not commit anything to upstream QEMU until we
get the spec finalized.  And that is proposed to be late this year.

Since I'm going to have to reformat the spec and adapt it into OASIS
style anyway, perhaps you should prepare a description as a standalone
text document.  Easier to email and work with...

Now, the idea is that if you want to support 0.9 and 1.0 (or whatever we
call them; I used the term legacy for existing implementations in the
OASIS WG proposal), you add capabilities and don't point them into (the
start of?) BAR0.  Old drivers use BAR0 as now.

One trick to note: while drivers shouldn't use both old and new style on
the same device, you need to allow it for kexec, particularly reset via
BAR0.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-02 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote:
  Yet the structure definitions are descriptive, capturing layout, size
  and endianness in natural a format readable by any C programmer.
 
 From an API design point of view, here are the problems I see:
 
  1) C makes no guarantees about structure layout beyond the first
 member.  Yes, if it's naturally aligned or has a packed attribute,
 GCC does the right thing.  But this isn't kernel land anymore,
 portability matters and there are more compilers than GCC.
 
 [ I argued in detail here, then deleted.  Damn bikeshedding... ]
 
 I think the best thing is to add the decoded integer versions as macros,
 and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
 match.

 Hmm you want to have both in the header?

 That would confuse users for sure :)

 I guess we could put in a big comment explaning
 why do we have both, and which version should
 userspace use. I tried to write it:

 /*
  * On all known C compilers, you can use the
  * offsetof macro to find the correct offset of each field -
  * if your compiler doesn't implement this correctly, use the
  * integer versions below, instead.
  * In that case please don't use the structures above, to avoid confusion.
  */

My version was a little less verbose :)

Subject: virtio_pci: macros for PCI layout offsets.

QEMU wants it, so why not?  Trust, but verify.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 24bcd9b..6510009 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -453,6 +453,64 @@ static void __iomem *map_capability(struct pci_dev *dev, 
int off, size_t minlen,
return p;
 }
 
+/* This is part of the ABI.  Don't screw with it. */
+static inline void check_offsets(void)
+{
+   /* Note: disk space was harmed in compilation of this function. */
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_VNDR !=
+offsetof(struct virtio_pci_cap, cap_vndr));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_NEXT !=
+offsetof(struct virtio_pci_cap, cap_next));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_LEN !=
+offsetof(struct virtio_pci_cap, cap_len));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_TYPE_AND_BAR !=
+offsetof(struct virtio_pci_cap, type_and_bar));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_OFFSET !=
+offsetof(struct virtio_pci_cap, offset));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_LENGTH !=
+offsetof(struct virtio_pci_cap, length));
+   BUILD_BUG_ON(VIRTIO_PCI_NOTIFY_CAP_MULT !=
+offsetof(struct virtio_pci_notify_cap,
+ notify_off_multiplier));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_DFSELECT != 
+offsetof(struct virtio_pci_common_cfg,
+ device_feature_select));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_DF != 
+offsetof(struct virtio_pci_common_cfg, device_feature));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_GFSELECT != 
+offsetof(struct virtio_pci_common_cfg,
+ guest_feature_select));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_GF != 
+offsetof(struct virtio_pci_common_cfg, guest_feature));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_MSIX != 
+offsetof(struct virtio_pci_common_cfg, msix_config));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_NUMQ != 
+offsetof(struct virtio_pci_common_cfg, num_queues));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_STATUS != 
+offsetof(struct virtio_pci_common_cfg, device_status));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SELECT != 
+offsetof(struct virtio_pci_common_cfg, queue_select));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SIZE != 
+offsetof(struct virtio_pci_common_cfg, queue_size));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_MSIX != 
+offsetof(struct virtio_pci_common_cfg, queue_msix_vector));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_ENABLE != 
+offsetof(struct virtio_pci_common_cfg, queue_enable));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NOFF != 
+offsetof(struct virtio_pci_common_cfg, queue_notify_off));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCLO != 
+offsetof(struct virtio_pci_common_cfg, queue_desc_lo));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCHI != 
+offsetof(struct virtio_pci_common_cfg, queue_desc_hi));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILLO != 
+offsetof(struct virtio_pci_common_cfg, queue_avail_lo));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILHI != 
+offsetof(struct virtio_pci_common_cfg, queue_avail_hi));
+   

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 03:41:09PM +0200, Paolo Bonzini wrote:
 Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
  You expect a compiler to pad this structure:
  
  struct foo {
  uint8_t a;
  uint8_t b;
  uint16_t c;
  uint32_t d;
  };
  
  I'm guessing any compiler that decides to waste memory in this way
  will quickly get dropped by users and then we won't worry
  about building QEMU with it.
 
 You know the virtio-pci config structures are padded, but not all of
 them are.  For example, virtio_balloon_stat is not padded and indeed has
 an __attribute__((__packed__)) in the spec.
 
 For this reason I prefer to have the attribute everywhere.  So people
 don't have to wonder why it's here and not there.
 
 Paolo

BTW we don't even do this consistently everywhere in QEMU.

It would have been better to have a rule to avoid packed
as much as possible, then we'd have found the misaligned field bug
in balloon before it's too late.

That would be a good rule to adopt I think:
any pack if something is misaligned, and document
the reason.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote:
  Yet the structure definitions are descriptive, capturing layout, size
  and endianness in natural a format readable by any C programmer.
 
 From an API design point of view, here are the problems I see:
 
  1) C makes no guarantees about structure layout beyond the first
 member.  Yes, if it's naturally aligned or has a packed attribute,
 GCC does the right thing.  But this isn't kernel land anymore,
 portability matters and there are more compilers than GCC.
 
 [ I argued in detail here, then deleted.  Damn bikeshedding... ]
 
 I think the best thing is to add the decoded integer versions as macros,
 and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
 match.

Hmm you want to have both in the header?

That would confuse users for sure :)

I guess we could put in a big comment explaning
why do we have both, and which version should
userspace use. I tried to write it:

/*
 * On all known C compilers, you can use the
 * offsetof macro to find the correct offset of each field -
 * if your compiler doesn't implement this correctly, use the
 * integer versions below, instead.
 * In that case please don't use the structures above, to avoid confusion.
 */


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 Forcing a guest driver change is a really big
 deal and I see no reason to do that unless there's a compelling reason
 to.

 So we're stuck with the 1.0 config layout for a very long time.

 We definitely must not force a guest change.  The explicit aim of the
 standard is that legacy and 1.0 be backward compatible.  One
 deliverable is a document detailing how this is done (effectively a
 summary of changes between what we have and 1.0).

If 2.0 is fully backwards compatible, great.  It seems like such a
difference that that would be impossible but I need to investigate
further.

Regards,

Anthony Liguori


 It's a delicate balancing act.  My plan is to accompany any changes in
 the standard with a qemu implementation, so we can see how painful those
 changes are.  And if there are performance implications, measure them.

 Cheers,
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 
  Anthony Liguori aligu...@us.ibm.com writes:
  Forcing a guest driver change is a really big
  deal and I see no reason to do that unless there's a compelling reason
  to.
 
  So we're stuck with the 1.0 config layout for a very long time.
 
  We definitely must not force a guest change.  The explicit aim of the
  standard is that legacy and 1.0 be backward compatible.  One
  deliverable is a document detailing how this is done (effectively a
  summary of changes between what we have and 1.0).
 
 If 2.0 is fully backwards compatible, great.  It seems like such a
 difference that that would be impossible but I need to investigate
 further.
 
 Regards,
 
 Anthony Liguori

If you look at my patches you'll see how it works.
Basically old guests use BAR0 new ones don't, so
it's easy: BAR0 access means legacy guest.
Only started testing but things seem to work
fine with old guests so far.

I think we need a spec, not just driver code.

Rusty what's the plan? Want me to write it?

 
  It's a delicate balancing act.  My plan is to accompany any changes in
  the standard with a qemu implementation, so we can see how painful those
  changes are.  And if there are performance implications, measure them.
 
  Cheers,
  Rusty.
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 06:33, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.
 Any good reason not to use offsetof?

 I'm not sure it's portable to use it in case labels.  IIRC, the
 definition ((int)(((T *)0)-field)) is not a valid C integer constant
 expression.  Laszlo?
 
 It's defined to yield an integer constant expression in the ISO standard
 (and I think ANSI too, though that's not at hand):

It's not in C89.  The oldest compiler QEMU cares about is GCC 4.2.  I
don't know if it has a builtin offsetof, probably it does.  But I'm not
sure about other users of virtio headers.

Paolo

 
 7.19, para 3:
 
 ...offsetof(type, member-designator)
 which expands to an integer constant expression that has type
 size_t, ...
 
 The real question is whether compilers qemu cares about meet the
 standard (there's some evidence that older compilers fail this).  If
 not, we'll have to define them as raw offsets...
 
 Cheers,
 Rusty.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 09:27:02AM +0200, Paolo Bonzini wrote:
 Il 29/05/2013 06:33, Rusty Russell ha scritto:
  Paolo Bonzini pbonz...@redhat.com writes:
  Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
  +
  +switch (addr) {
  +case offsetof(struct virtio_pci_common_cfg, 
  device_feature_select):
  +return proxy-device_feature_select;
 
  Oh dear no...  Please use defines like the rest of QEMU.
  Any good reason not to use offsetof?
 
  I'm not sure it's portable to use it in case labels.  IIRC, the
  definition ((int)(((T *)0)-field)) is not a valid C integer constant
  expression.  Laszlo?
  
  It's defined to yield an integer constant expression in the ISO standard
  (and I think ANSI too, though that's not at hand):
 
 It's not in C89.  The oldest compiler QEMU cares about is GCC 4.2.  I
 don't know if it has a builtin offsetof, probably it does.

Yes, I think __builtin_offsetof was added in 4.0:
http://gcc.gnu.org/onlinedocs/gcc-4.0.0/gcc/Offsetof.html


 But I'm not sure about other users of virtio headers.
 
 Paolo
  
  7.19, para 3:
  
  ...offsetof(type, member-designator)
  which expands to an integer constant expression that has type
  size_t, ...
  
  The real question is whether compilers qemu cares about meet the
  standard (there's some evidence that older compilers fail this).  If
  not, we'll have to define them as raw offsets...

So I think the question is whether we care about GCC 3.  We used to when
mingw in debian stable was GCC 3, but not anymore ...

  
  Cheers,
  Rusty.
  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote:
 Anthony Liguori aligu...@us.ibm.com writes:
  Michael S. Tsirkin m...@redhat.com writes:
  +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
  +return proxy-device_feature_select;
 
  Oh dear no...  Please use defines like the rest of QEMU.
 
 It is pretty ugly.
 
 Yet the structure definitions are descriptive, capturing layout, size
 and endianness in natural a format readable by any C programmer.
 
 So AFAICT the question is, do we put the required
 
 #define VIRTIO_PCI_CFG_FEATURE_SEL \
  (offsetof(struct virtio_pci_common_cfg, device_feature_select))
 
 etc. in the kernel headers or qemu?


I forgot that we need to validate size (different fields
have different sizes so memory core does not validate
for us).

And that is much cleaner if we use offsetof directly:
You can see that you are checking the correct
size matching the offset, at a glance.

---

virtio: new layout: add offset validation

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index f4db224..fd09ea7 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void 
*opaque, hwaddr addr,
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = proxy-vdev;
+struct virtio_pci_common_cfg cfg;
 
 uint64_t low = 0xull;
 
 switch (addr) {
 case offsetof(struct virtio_pci_common_cfg, device_feature_select):
+assert(size == sizeof cfg.device_feature_select);
 return proxy-device_feature_select;
 case offsetof(struct virtio_pci_common_cfg, device_feature):
+assert(size == sizeof cfg.device_feature);
 /* TODO: 64-bit features */
return proxy-device_feature_select ? 0 : proxy-host_features;
 case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
+assert(size == sizeof cfg.guest_feature_select);
 return proxy-guest_feature_select;
 case offsetof(struct virtio_pci_common_cfg, guest_feature):
+assert(size == sizeof cfg.guest_feature);
 /* TODO: 64-bit features */
return proxy-guest_feature_select ? 0 : vdev-guest_features;
 case offsetof(struct virtio_pci_common_cfg, msix_config):
+assert(size == sizeof cfg.msix_config);
return vdev-config_vector;
 case offsetof(struct virtio_pci_common_cfg, num_queues):
+assert(size == sizeof cfg.num_queues);
 /* TODO: more exact limit? */
return VIRTIO_PCI_QUEUE_MAX;
 case offsetof(struct virtio_pci_common_cfg, device_status):
+assert(size == sizeof cfg.device_status);
 return vdev-status;
 
/* About a specific virtqueue. */
 case offsetof(struct virtio_pci_common_cfg, queue_select):
+assert(size == sizeof cfg.queue_select);
 return  vdev-queue_sel;
 case offsetof(struct virtio_pci_common_cfg, queue_size):
+assert(size == sizeof cfg.queue_size);
 return virtio_queue_get_num(vdev, vdev-queue_sel);
 case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
+assert(size == sizeof cfg.queue_msix_vector);
return virtio_queue_vector(vdev, vdev-queue_sel);
 case offsetof(struct virtio_pci_common_cfg, queue_enable):
+assert(size == sizeof cfg.queue_enable);
 /* TODO */
return 0;
 case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
+assert(size == sizeof cfg.queue_notify_off);
 return vdev-queue_sel;
 case offsetof(struct virtio_pci_common_cfg, queue_desc):
+assert(size == 4);
 return virtio_queue_get_desc_addr(vdev, vdev-queue_sel)  low;
 case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
+assert(size == 4);
 return virtio_queue_get_desc_addr(vdev, vdev-queue_sel)  32;
 case offsetof(struct virtio_pci_common_cfg, queue_avail):
+assert(size == 4);
 return virtio_queue_get_avail_addr(vdev, vdev-queue_sel)  low;
 case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
+assert(size == 4);
 return virtio_queue_get_avail_addr(vdev, vdev-queue_sel)  32;
 case offsetof(struct virtio_pci_common_cfg, queue_used):
+assert(size == 4);
 return virtio_queue_get_used_addr(vdev, vdev-queue_sel)  low;
 case offsetof(struct virtio_pci_common_cfg, queue_used) + 4:
+assert(size == 4);
 return virtio_queue_get_used_addr(vdev, vdev-queue_sel)  32;
 default:
 return 0;
@@ -523,76 +542,95 @@ static void virtio_pci_config_common_write(void *opaque, 
hwaddr addr,
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = proxy-vdev;
+struct virtio_pci_common_cfg cfg;
 
 uint64_t low = 0xull;
 uint64_t high = ~low;
 
 switch (addr) {
 case offsetof(struct virtio_pci_common_cfg, device_feature_select):

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 10:24, Michael S. Tsirkin ha scritto:
 On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote:
 Anthony Liguori aligu...@us.ibm.com writes:
 Michael S. Tsirkin m...@redhat.com writes:
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.

 It is pretty ugly.

 Yet the structure definitions are descriptive, capturing layout, size
 and endianness in natural a format readable by any C programmer.

 So AFAICT the question is, do we put the required

 #define VIRTIO_PCI_CFG_FEATURE_SEL \
  (offsetof(struct virtio_pci_common_cfg, device_feature_select))

 etc. in the kernel headers or qemu?
 
 
 I forgot that we need to validate size (different fields
 have different sizes so memory core does not validate
 for us).
 
 And that is much cleaner if we use offsetof directly:
 You can see that you are checking the correct
 size matching the offset, at a glance.

I wonder if we can use and possibly extend Peter Crosthwaite's register
API to support this better.

Paolo

 ---
 
 virtio: new layout: add offset validation
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 ---
 
 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index f4db224..fd09ea7 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void 
 *opaque, hwaddr addr,
  {
  VirtIOPCIProxy *proxy = opaque;
  VirtIODevice *vdev = proxy-vdev;
 +struct virtio_pci_common_cfg cfg;
  
  uint64_t low = 0xull;
  
  switch (addr) {
  case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +assert(size == sizeof cfg.device_feature_select);
  return proxy-device_feature_select;
  case offsetof(struct virtio_pci_common_cfg, device_feature):
 +assert(size == sizeof cfg.device_feature);
  /* TODO: 64-bit features */
   return proxy-device_feature_select ? 0 : proxy-host_features;
  case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
 +assert(size == sizeof cfg.guest_feature_select);
  return proxy-guest_feature_select;
  case offsetof(struct virtio_pci_common_cfg, guest_feature):
 +assert(size == sizeof cfg.guest_feature);
  /* TODO: 64-bit features */
   return proxy-guest_feature_select ? 0 : vdev-guest_features;
  case offsetof(struct virtio_pci_common_cfg, msix_config):
 +assert(size == sizeof cfg.msix_config);
   return vdev-config_vector;
  case offsetof(struct virtio_pci_common_cfg, num_queues):
 +assert(size == sizeof cfg.num_queues);
  /* TODO: more exact limit? */
   return VIRTIO_PCI_QUEUE_MAX;
  case offsetof(struct virtio_pci_common_cfg, device_status):
 +assert(size == sizeof cfg.device_status);
  return vdev-status;
  
   /* About a specific virtqueue. */
  case offsetof(struct virtio_pci_common_cfg, queue_select):
 +assert(size == sizeof cfg.queue_select);
  return  vdev-queue_sel;
  case offsetof(struct virtio_pci_common_cfg, queue_size):
 +assert(size == sizeof cfg.queue_size);
  return virtio_queue_get_num(vdev, vdev-queue_sel);
  case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
 +assert(size == sizeof cfg.queue_msix_vector);
   return virtio_queue_vector(vdev, vdev-queue_sel);
  case offsetof(struct virtio_pci_common_cfg, queue_enable):
 +assert(size == sizeof cfg.queue_enable);
  /* TODO */
   return 0;
  case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
 +assert(size == sizeof cfg.queue_notify_off);
  return vdev-queue_sel;
  case offsetof(struct virtio_pci_common_cfg, queue_desc):
 +assert(size == 4);
  return virtio_queue_get_desc_addr(vdev, vdev-queue_sel)  low;
  case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
 +assert(size == 4);
  return virtio_queue_get_desc_addr(vdev, vdev-queue_sel)  32;
  case offsetof(struct virtio_pci_common_cfg, queue_avail):
 +assert(size == 4);
  return virtio_queue_get_avail_addr(vdev, vdev-queue_sel)  low;
  case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
 +assert(size == 4);
  return virtio_queue_get_avail_addr(vdev, vdev-queue_sel)  32;
  case offsetof(struct virtio_pci_common_cfg, queue_used):
 +assert(size == 4);
  return virtio_queue_get_used_addr(vdev, vdev-queue_sel)  low;
  case offsetof(struct virtio_pci_common_cfg, queue_used) + 4:
 +assert(size == 4);
  return virtio_queue_get_used_addr(vdev, vdev-queue_sel)  32;
  default:
  return 0;
 @@ -523,76 +542,95 @@ static void virtio_pci_config_common_write(void 
 *opaque, hwaddr addr,
  {
  VirtIOPCIProxy *proxy = opaque;
   

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Peter Maydell
On 29 May 2013 09:24, Michael S. Tsirkin m...@redhat.com wrote:
 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index f4db224..fd09ea7 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void 
 *opaque, hwaddr addr,
  {
  VirtIOPCIProxy *proxy = opaque;
  VirtIODevice *vdev = proxy-vdev;
 +struct virtio_pci_common_cfg cfg;

  uint64_t low = 0xull;

  switch (addr) {
  case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +assert(size == sizeof cfg.device_feature_select);
  return proxy-device_feature_select;

Asserting is definitely the wrong thing here, since the
guest can trigger it.

If you really want to use offsetof like this you're
going to need to decorate the structs with QEMU_PACKED.

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Laszlo Ersek
On 05/29/13 09:27, Paolo Bonzini wrote:
 Il 29/05/2013 06:33, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, 
 device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.
 Any good reason not to use offsetof?

 I'm not sure it's portable to use it in case labels.  IIRC, the
 definition ((int)(((T *)0)-field)) is not a valid C integer constant
 expression.  Laszlo?

 It's defined to yield an integer constant expression in the ISO standard
 (and I think ANSI too, though that's not at hand):
 
 It's not in C89.

It is, see 7.1.6 Common definitions stddef.h.


 The oldest compiler QEMU cares about is GCC 4.2.  I
 don't know if it has a builtin offsetof, probably it does.

Talking about nothing else but the ISO C standard(s), it doesn't matter
how the C implementation deals with offsetof() internally. C
implementation in this sense includes both compiler and standard library.

If the standard library header (stddef.h or something internal
included by it) expands the offsetof() macro to replacement text that
does pointer black magic, magic that would be otherwise nonportable or
undefined, it is alright as long as the accompanying compiler guarantees
that the replacement text will work.

That is, offsetof() gives the C implementor leeway to implement it in
wherever distribution between compiler and standard library; the main
thing is that the C program use the one offsetof() provided by the C
implementation.

Of course in the FLOSS world OS distros might mix and match gcc and
glibc statistically randomly; normally some documentation should enable
the user to put his/her system into ISO C or even SUSv[1234] mode.

(

An interesting example for, say, non-unified implementation is
getconf vs. c99. I'll pick SUSv3 for this example.

http://pubs.opengroup.org/onlinepubs/95399/utilities/getconf.html
http://pubs.opengroup.org/onlinepubs/95399/utilities/c99.html

In a nutshell one can interrogate getconf for CFLAGS, LDFLAGS and LIBS
so that c99 builds a program in ILP32_OFF32 / ILP32_OFFBIG / LP64_OFF64
/ LPBIG_OFFBIG mode (programming environment).

On my x86_64 RHEL-6.4 laptop, getconf is part of glibc
(glibc-common-2.12-1.107.el6.x86_64), while c99 is part of gcc
(gcc-4.4.7-3.el6.x86_64).

Supposing I'd like to build a 32-bit program with 64-bit off_t:

  getconf _POSIX_V6_ILP32_OFFBIG
  1

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_CFLAGS
  -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LDFLAGS
  -m32

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LIBS
  [empty string]

However if I try to actually build a program using these flags:

  #define _XOPEN_SOURCE 600
  int main(void) { return 0; }

  c99 \
  -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 \
  -o mytest \
  -m32 \
  mytest.c

I get

  /usr/bin/ld: crt1.o: No such file: No such file or directory
  collect2: ld returned 1 exit status

unless I install glibc-devel.i686.

This problem is rooted in getconf (a glibc utility) being unaware of
RHEL packaging choices: if the system can't guarantee the final c99
invocation to succeed, then the very first getconf invocation should
write -1\n or undefined\n to stdout.

(I'm aware that RHEL-6 is not certified UNIX 03; this is just an example
for problems caused by various parts of a standard's
(quasi-)implementation coming from separate projects. In that sense,
caring about the offsetof() internals may have merit.)

)

Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote:
 On 29 May 2013 09:24, Michael S. Tsirkin m...@redhat.com wrote:
  diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
  index f4db224..fd09ea7 100644
  --- a/hw/virtio/virtio-pci.c
  +++ b/hw/virtio/virtio-pci.c
  @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void 
  *opaque, hwaddr addr,
   {
   VirtIOPCIProxy *proxy = opaque;
   VirtIODevice *vdev = proxy-vdev;
  +struct virtio_pci_common_cfg cfg;
 
   uint64_t low = 0xull;
 
   switch (addr) {
   case offsetof(struct virtio_pci_common_cfg, device_feature_select):
  +assert(size == sizeof cfg.device_feature_select);
   return proxy-device_feature_select;
 
 Asserting is definitely the wrong thing here, since the
 guest can trigger it.

So?

It's a driver bug. It can reset or crash guest with the same effect,
and it likely will if we let it continue.
assert makes sure we don't let it escalate into some
hard to debug security problem.


 If you really want to use offsetof like this you're
 going to need to decorate the structs with QEMU_PACKED.
 
 thanks
 -- PMM

Nope.
These structs are carefully designed not to have any padding.

And if there was a bug and there was some padding, we still
can't fix it with PACKED because this structure
is used to interact with the guest code which does not
have the packed attribute.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Peter Maydell
On 29 May 2013 11:08, Michael S. Tsirkin m...@redhat.com wrote:
 On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote:
 Asserting is definitely the wrong thing here, since the
 guest can trigger it.

 So?

So guest should not be able to crash QEMU is a standard
rule: assert() is for QEMU bugs, not guest bugs.
Virtio isn't any different to any other device model.

 It's a driver bug. It can reset or crash guest with the same effect,
 and it likely will if we let it continue.

Letting a misbehaving guest crash itself is fine. Killing
QEMU isn't.

 assert makes sure we don't let it escalate into some
 hard to debug security problem.

If you want to make guest bugs easier to spot and debug
this is what qemu_log_mask(LOG_GUEST_ERROR,...) is for.

 If you really want to use offsetof like this you're
 going to need to decorate the structs with QEMU_PACKED.

 Nope.
 These structs are carefully designed not to have any padding.

...on every compiler and OS combination that QEMU builds for?

 And if there was a bug and there was some padding, we still
 can't fix it with PACKED because this structure
 is used to interact with the guest code which does not
 have the packed attribute.

The guest code has to use a set of structure offsets and
sizes which is fixed by the virtio ABI -- how it implements
this is up to the guest (and if it misimplements it that is
a guest bug and not our problem). Note also that the ABI
is not defined by a set of C source struct definitions
(except in as far as they are accompanied by a set of
restrictions on alignment, padding, etc that completely
determine the numerical alignments and offsets).
How QEMU implements the set of offsets and sizes specified
by the ABI is definitely our problem, and is exactly what
this discussion is about. The simplest and safest way to
get the offsets right on all platforms is just to use a set
of #defines, which is why that's what almost all of our
device models do. Where we do define a struct matching
guest memory layout, we tag it with QEMU_PACKED as our
best means of ensuring consistency on all hosts.

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 11:53:17AM +0100, Peter Maydell wrote:
 On 29 May 2013 11:08, Michael S. Tsirkin m...@redhat.com wrote:
  On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote:
  Asserting is definitely the wrong thing here, since the
  guest can trigger it.
 
  So?
 
 So guest should not be able to crash QEMU is a standard
 rule: assert() is for QEMU bugs, not guest bugs.

It's quite likely a QEMU bug -
guests normally do fixed-size accesses so it's hard
for them to access a field with a wrong size.

It is guest triggerable but this does not
contradict the fact that it never happens in practice.

 Virtio isn't any different to any other device model.
 
  It's a driver bug. It can reset or crash guest with the same effect,
  and it likely will if we let it continue.
 
 Letting a misbehaving guest crash itself is fine. Killing
 QEMU isn't.

*Why* isn't it?
It has the advantage of making sure the misbehaving system
is stopped before it does any damage.
Why keep QEMU running even though we know
there's a memory corruption somewhere?

  assert makes sure we don't let it escalate into some
  hard to debug security problem.
 
 If you want to make guest bugs easier to spot and debug
 this is what qemu_log_mask(LOG_GUEST_ERROR,...) is for.

We really want something that would also stop guest
and stop device model as well - we don't know where the bug is.

  If you really want to use offsetof like this you're
  going to need to decorate the structs with QEMU_PACKED.
 
  Nope.
  These structs are carefully designed not to have any padding.
 
 ...on every compiler and OS combination that QEMU builds for?

Yes. All the way back to EGCS and before.
GCC has been like this for many many years.

  And if there was a bug and there was some padding, we still
  can't fix it with PACKED because this structure
  is used to interact with the guest code which does not
  have the packed attribute.
 
 The guest code has to use a set of structure offsets and
 sizes which is fixed by the virtio ABI -- how it implements
 this is up to the guest (and if it misimplements it that is
 a guest bug and not our problem). Note also that the ABI
 is not defined by a set of C source struct definitions
 (except in as far as they are accompanied by a set of
 restrictions on alignment, padding, etc that completely
 determine the numerical alignments and offsets).

In practical terms, we should all talk and agree on what's
best for driver *and* QEMU, not have QEMU just
ignore driver and do it's own thing.

In practical terms, virtio in QEMU should use
exactly same code to define layout as virtio in guest.
Preferably same header file, we'll get there too,
once we comple the discussion over the bikeshed color.

Deviating from driver in random ways is an endless source
of hard to debug issues, and it's a very practical
consideration because virtio spec is constantly
being extended (unlike hardware which is mostly fixed).


 How QEMU implements the set of offsets and sizes specified
 by the ABI is definitely our problem, and is exactly what
 this discussion is about. The simplest and safest way to
 get the offsets right on all platforms is just to use a set
 of #defines, which is why that's what almost all of our
 device models do.

Maybe you don't feel safe when you see offsetof.
I review a struct and see fields are aligned
properly at a glance and I feel safe.

But I don't feel safe when I see headers in guest and qemu have
different code.

 Where we do define a struct matching
 guest memory layout, we tag it with QEMU_PACKED as our
 best means of ensuring consistency on all hosts.
 
 thanks
 -- PMM

Further, packed structures produce terrible
code in GCC, in all versions that QEMU cares about :).

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 14:16, Michael S. Tsirkin ha scritto:
   If you really want to use offsetof like this you're
   going to need to decorate the structs with QEMU_PACKED.
  
   Nope.
   These structs are carefully designed not to have any padding.
  
  ...on every compiler and OS combination that QEMU builds for?
 Yes. All the way back to EGCS and before.
 GCC has been like this for many many years.

I would still prefer to have QEMU_PACKED (or __attribute((__packed__))
in the kernel).

   And if there was a bug and there was some padding, we still
   can't fix it with PACKED because this structure
   is used to interact with the guest code which does not
   have the packed attribute.
  
  The guest code has to use a set of structure offsets and
  sizes which is fixed by the virtio ABI -- how it implements
  this is up to the guest (and if it misimplements it that is
  a guest bug and not our problem).

On the other hand, encouraging both userspace and guest to reuse a
single set of headers means that the bad offset becomes a spec bug more
than a guest bug.

 Deviating from driver in random ways is an endless source
 of hard to debug issues, and it's a very practical
 consideration because virtio spec is constantly
 being extended (unlike hardware which is mostly fixed).

Agreed---but the driver should use __attribute__((__packed__)) too.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 02:28:32PM +0200, Paolo Bonzini wrote:
 Il 29/05/2013 14:16, Michael S. Tsirkin ha scritto:
If you really want to use offsetof like this you're
going to need to decorate the structs with QEMU_PACKED.
   
Nope.
These structs are carefully designed not to have any padding.
   
   ...on every compiler and OS combination that QEMU builds for?
  Yes. All the way back to EGCS and before.
  GCC has been like this for many many years.
 
 I would still prefer to have QEMU_PACKED (or __attribute((__packed__))
 in the kernel).
 
And if there was a bug and there was some padding, we still
can't fix it with PACKED because this structure
is used to interact with the guest code which does not
have the packed attribute.
   
   The guest code has to use a set of structure offsets and
   sizes which is fixed by the virtio ABI -- how it implements
   this is up to the guest (and if it misimplements it that is
   a guest bug and not our problem).
 
 On the other hand, encouraging both userspace and guest to reuse a
 single set of headers means that the bad offset becomes a spec bug more
 than a guest bug.

Heh

  Deviating from driver in random ways is an endless source
  of hard to debug issues, and it's a very practical
  consideration because virtio spec is constantly
  being extended (unlike hardware which is mostly fixed).
 
 Agreed---but the driver should use __attribute__((__packed__)) too.
 
 Paolo

For these specific structures I don't mind, we never dereference them
anyway. If you do dereference a structure, using packed is generally
a mistake. In particular because GCC generates code that is
much slower.

You can also get nasty surprises e.g.

struct foo {
char a;
int b;
} __attribute__((packed));

struct foo foo;
int *a = foo.a;

Last line compiles fine but isn't legal C.

So I think there are two cases with packed:
1. it does not change logical behaviour but results in bad code
   generated
2. it does change logical behaviour and leads to unsafe code

There aren't many good reasons to use packed: if you
must treat unaligned data, best to use constants.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 Michael S. Tsirkin m...@redhat.com writes:
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.

 It is pretty ugly.

I think beauty is in the eye of the beholder here...

Pretty much every device we have has a switch statement like this.
Consistency wins when it comes to qualitative arguments like this.

 Yet the structure definitions are descriptive, capturing layout, size
 and endianness in natural a format readable by any C programmer.

From an API design point of view, here are the problems I see:

1) C makes no guarantees about structure layout beyond the first
   member.  Yes, if it's naturally aligned or has a packed attribute,
   GCC does the right thing.  But this isn't kernel land anymore,
   portability matters and there are more compilers than GCC.

2) If we every introduce anything like latching, this doesn't work out
   so well anymore because it's hard to express in a single C structure
   the register layout at that point.  Perhaps a union could be used but
   padding may make it a bit challenging.

3) It suspect it's harder to review because a subtle change could more
   easily have broad impact.  If someone changed the type of a field
   from u32 to u16, it changes the offset of every other field.  That's
   not terribly obvious in the patch itself unless you understand how
   the structure is used elsewhere.

   This may not be a problem for virtio because we all understand that
   the structures are part of an ABI, but if we used this pattern more
   in QEMU, it would be a lot less obvious.

 So AFAICT the question is, do we put the required

 #define VIRTIO_PCI_CFG_FEATURE_SEL \
  (offsetof(struct virtio_pci_common_cfg, device_feature_select))

 etc. in the kernel headers or qemu?

I'm pretty sure we would end up just having our own integer defines.  We
carry our own virtio headers today because we can't easily import the
kernel headers.

 Haven't looked at the proposed new ring layout yet.

 No change, but there's an open question on whether we should nail it to
 little endian (or define the endian by the transport).

 Of course, I can't rule out that the 1.0 standard *may* decide to frob
 the ring layout somehow,

Well, given that virtio is widely deployed today, I would think the 1.0
standard should strictly reflect what's deployed today, no?

Any new config layout would be 2.0 material, right?

Re: the new config layout, I don't think we would want to use it for
anything but new devices.  Forcing a guest driver change is a really big
deal and I see no reason to do that unless there's a compelling reason
to.

So we're stuck with the 1.0 config layout for a very long time.

Regards,

Anthony Liguori

 but I'd think it would require a compelling
 reason.  I suggest that's 2.0 material...

 Cheers,
 Rusty.

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 
  Anthony Liguori aligu...@us.ibm.com writes:
  Michael S. Tsirkin m...@redhat.com writes:
  +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
  +return proxy-device_feature_select;
 
  Oh dear no...  Please use defines like the rest of QEMU.
 
  It is pretty ugly.
 
 I think beauty is in the eye of the beholder here...
 
 Pretty much every device we have has a switch statement like this.
 Consistency wins when it comes to qualitative arguments like this.
 
  Yet the structure definitions are descriptive, capturing layout, size
  and endianness in natural a format readable by any C programmer.
 
 From an API design point of view, here are the problems I see:
 
 1) C makes no guarantees about structure layout beyond the first
member.  Yes, if it's naturally aligned or has a packed attribute,
GCC does the right thing.  But this isn't kernel land anymore,
portability matters and there are more compilers than GCC.

You expect a compiler to pad this structure:

struct foo {
uint8_t a;
uint8_t b;
uint16_t c;
uint32_t d;
};

I'm guessing any compiler that decides to waste memory in this way
will quickly get dropped by users and then we won't worry
about building QEMU with it.

 2) If we every introduce anything like latching, this doesn't work out
so well anymore because it's hard to express in a single C structure
the register layout at that point.  Perhaps a union could be used but
padding may make it a bit challenging.

Then linux won't use it either.

 3) It suspect it's harder to review because a subtle change could more
easily have broad impact.  If someone changed the type of a field
from u32 to u16, it changes the offset of every other field.  That's
not terribly obvious in the patch itself unless you understand how
the structure is used elsewhere.
 
This may not be a problem for virtio because we all understand that
the structures are part of an ABI, but if we used this pattern more
in QEMU, it would be a lot less obvious.

So let's not use it more in QEMU.

  So AFAICT the question is, do we put the required
 
  #define VIRTIO_PCI_CFG_FEATURE_SEL \
   (offsetof(struct virtio_pci_common_cfg, device_feature_select))
 
  etc. in the kernel headers or qemu?
 
 I'm pretty sure we would end up just having our own integer defines.  We
 carry our own virtio headers today because we can't easily import the
 kernel headers.

Yes we can easily import them.
And at least we copy headers verbatim.


  Haven't looked at the proposed new ring layout yet.
 
  No change, but there's an open question on whether we should nail it to
  little endian (or define the endian by the transport).
 
  Of course, I can't rule out that the 1.0 standard *may* decide to frob
  the ring layout somehow,
 
 Well, given that virtio is widely deployed today, I would think the 1.0
 standard should strictly reflect what's deployed today, no?
 Any new config layout would be 2.0 material, right?

Not as it's currently planned. Devices can choose
to support a legacy layout in addition to the new one,
and if you look at the patch you will see that that
is exactly what it does.

 Re: the new config layout, I don't think we would want to use it for
 anything but new devices.  Forcing a guest driver change

There's no forcing.
If you look at the patches closely, you will see that
we still support the old layout on BAR0.


 is a really big
 deal and I see no reason to do that unless there's a compelling reason
 to.

There are many a compelling reasons, and they are well known
limitations of virtio PCI:

- PCI spec compliance (madates device operation with IO memory disabled).
- support 64 bit addressing
- add more than 32 feature bits.
- individually disable queues.
- sanely support cross-endian systems.
- support very small (1 PAGE) for virtio rings.
- support a separate page for each vq kick.
- make each device place config at flexible offset.

Addressing any one of these would cause us to add a substantially new
way to operate virtio devices.

And since it's a guest change anyway, it seemed like a
good time to do the new layout and fix everything in one go.

And they are needed like yesterday.


 So we're stuck with the 1.0 config layout for a very long time.
 
 Regards,
 
 Anthony Liguori

Absolutely. This patch let us support both which will allow for
a gradual transition over the next 10 years or so.

  reason.  I suggest that's 2.0 material...
 
  Cheers,
  Rusty.
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Peter Maydell
On 29 May 2013 14:24, Michael S. Tsirkin m...@redhat.com wrote:
 On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
 1) C makes no guarantees about structure layout beyond the first
member.  Yes, if it's naturally aligned or has a packed attribute,
GCC does the right thing.  But this isn't kernel land anymore,
portability matters and there are more compilers than GCC.

 You expect a compiler to pad this structure:

 struct foo {
 uint8_t a;
 uint8_t b;
 uint16_t c;
 uint32_t d;
 };

 I'm guessing any compiler that decides to waste memory in this way
 will quickly get dropped by users and then we won't worry
 about building QEMU with it.

Structure alignment is a platform ABI choice. Oddly enough people
choose operating systems on more criteria than the amount of
padding their ABI mandates in structures.

In any case it's really tedious to read a structure and wade
through working out whether it's going to have all its members
naturally aligned, especially once it's more than a handful
of members in size. And getting it wrong is silent portability
problem.

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
 You expect a compiler to pad this structure:
 
 struct foo {
   uint8_t a;
   uint8_t b;
   uint16_t c;
   uint32_t d;
 };
 
 I'm guessing any compiler that decides to waste memory in this way
 will quickly get dropped by users and then we won't worry
 about building QEMU with it.

You know the virtio-pci config structures are padded, but not all of
them are.  For example, virtio_balloon_stat is not padded and indeed has
an __attribute__((__packed__)) in the spec.

For this reason I prefer to have the attribute everywhere.  So people
don't have to wonder why it's here and not there.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 03:41:09PM +0200, Paolo Bonzini wrote:
 Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
  You expect a compiler to pad this structure:
  
  struct foo {
  uint8_t a;
  uint8_t b;
  uint16_t c;
  uint32_t d;
  };
  
  I'm guessing any compiler that decides to waste memory in this way
  will quickly get dropped by users and then we won't worry
  about building QEMU with it.
 
 You know the virtio-pci config structures are padded, but not all of
 them are.  For example, virtio_balloon_stat is not padded and indeed has
 an __attribute__((__packed__)) in the spec.
 
 For this reason I prefer to have the attribute everywhere.  So people
 don't have to wonder why it's here and not there.
 
 Paolo

FWIW I think it was a mistake to lay the balloon structure out like that.
We should have padded it manually.
__attribute__((__packed__)) is really easy to misuse.
If you get into such a situation, just use offset enums ...

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
 1) C makes no guarantees about structure layout beyond the first
member.  Yes, if it's naturally aligned or has a packed attribute,
GCC does the right thing.  But this isn't kernel land anymore,
portability matters and there are more compilers than GCC.

 You expect a compiler to pad this structure:

 struct foo {
   uint8_t a;
   uint8_t b;
   uint16_t c;
   uint32_t d;
 };

 I'm guessing any compiler that decides to waste memory in this way
 will quickly get dropped by users and then we won't worry
 about building QEMU with it.

There are other responses in the thread here and I don't really care to
bikeshed on this issue.

 Well, given that virtio is widely deployed today, I would think the 1.0
 standard should strictly reflect what's deployed today, no?
 Any new config layout would be 2.0 material, right?

 Not as it's currently planned. Devices can choose
 to support a legacy layout in addition to the new one,
 and if you look at the patch you will see that that
 is exactly what it does.

Adding a new BAR most certainly requires bumping the revision ID or
changing the device ID, no?

Didn't we run into this problem with the virtio-win drivers with just
the BAR size changing? 

 Re: the new config layout, I don't think we would want to use it for
 anything but new devices.  Forcing a guest driver change

 There's no forcing.
 If you look at the patches closely, you will see that
 we still support the old layout on BAR0.


 is a really big
 deal and I see no reason to do that unless there's a compelling reason
 to.

 There are many a compelling reasons, and they are well known
 limitations of virtio PCI:

 - PCI spec compliance (madates device operation with IO memory
 disabled).

PCI express spec.  We are fully compliant with the PCI spec.  And what's
the user visible advantage of pointing an emulated virtio device behind
a PCI-e bus verses a legacy PCI bus?

This is a very good example because if we have to disable BAR0, then
it's an ABI breaker plan and simple.

 - support 64 bit addressing

We currently support 44-bit addressing for the ring.  While I agree we
need to bump it, there's no immediate problem with 44-bit addressing.

 - add more than 32 feature bits.
 - individually disable queues.
 - sanely support cross-endian systems.
 - support very small (1 PAGE) for virtio rings.
 - support a separate page for each vq kick.
 - make each device place config at flexible offset.

None of these things are holding us back today.

I'm not saying we shouldn't introduce a new device.  But adoption of
that device will be slow and realistically will be limited to new
devices only.

We'll be supporting both devices for a very, very long time.

Compatibility is the fundamental value that we provide.  We need to go
out of our way to make sure that existing guests work and work as well
as possible.

Sticking virtio devices behind a PCI-e bus just for the hell of it isn't
a compelling reason to break existing guests.

Regards,

Anthony Liguori


 Addressing any one of these would cause us to add a substantially new
 way to operate virtio devices.

 And since it's a guest change anyway, it seemed like a
 good time to do the new layout and fix everything in one go.

 And they are needed like yesterday.


 So we're stuck with the 1.0 config layout for a very long time.
 
 Regards,
 
 Anthony Liguori

 Absolutely. This patch let us support both which will allow for
 a gradual transition over the next 10 years or so.

  reason.  I suggest that's 2.0 material...
 
  Cheers,
  Rusty.
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Paolo Bonzini pbonz...@redhat.com writes:

 Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
 You expect a compiler to pad this structure:
 
 struct foo {
  uint8_t a;
  uint8_t b;
  uint16_t c;
  uint32_t d;
 };
 
 I'm guessing any compiler that decides to waste memory in this way
 will quickly get dropped by users and then we won't worry
 about building QEMU with it.

 You know the virtio-pci config structures are padded, but not all of
 them are.  For example, virtio_balloon_stat is not padded and indeed has
 an __attribute__((__packed__)) in the spec.

Not that these structures are actually used for something.

We store the config in these structures so they are actually used for
something.

The proposed structures only serve as a way to express offsets.  You
would never actually have a variable of this type.

Regards,

Anthony Liguori


 For this reason I prefer to have the attribute everywhere.  So people
 don't have to wonder why it's here and not there.

 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
  1) C makes no guarantees about structure layout beyond the first
 member.  Yes, if it's naturally aligned or has a packed attribute,
 GCC does the right thing.  But this isn't kernel land anymore,
 portability matters and there are more compilers than GCC.
 
  You expect a compiler to pad this structure:
 
  struct foo {
  uint8_t a;
  uint8_t b;
  uint16_t c;
  uint32_t d;
  };
 
  I'm guessing any compiler that decides to waste memory in this way
  will quickly get dropped by users and then we won't worry
  about building QEMU with it.
 
 There are other responses in the thread here and I don't really care to
 bikeshed on this issue.

Great. Let's make the bikeshed blue then?

  Well, given that virtio is widely deployed today, I would think the 1.0
  standard should strictly reflect what's deployed today, no?
  Any new config layout would be 2.0 material, right?
 
  Not as it's currently planned. Devices can choose
  to support a legacy layout in addition to the new one,
  and if you look at the patch you will see that that
  is exactly what it does.
 
 Adding a new BAR most certainly requires bumping the revision ID or
 changing the device ID, no?

No, why would it?

If a device dropped BAR0, that would be a good reason
to bump revision ID.
We don't do this yet.

 Didn't we run into this problem with the virtio-win drivers with just
 the BAR size changing? 

Because they had a bug: they validated BAR0 size. AFAIK they don't care
what happens with other bars.

  Re: the new config layout, I don't think we would want to use it for
  anything but new devices.  Forcing a guest driver change
 
  There's no forcing.
  If you look at the patches closely, you will see that
  we still support the old layout on BAR0.
 
 
  is a really big
  deal and I see no reason to do that unless there's a compelling reason
  to.
 
  There are many a compelling reasons, and they are well known
  limitations of virtio PCI:
 
  - PCI spec compliance (madates device operation with IO memory
  disabled).
 
 PCI express spec.  We are fully compliant with the PCI spec.  And what's
 the user visible advantage of pointing an emulated virtio device behind
 a PCI-e bus verses a legacy PCI bus?

Native hotplug support.

 This is a very good example because if we have to disable BAR0, then
 it's an ABI breaker plan and simple.

Not we. The BIOS can disable IO BAR: it can do this already
but the device won't be functional.

  - support 64 bit addressing
 
 We currently support 44-bit addressing for the ring.  While I agree we
 need to bump it, there's no immediate problem with 44-bit addressing.

I heard developers (though not users) complaining.

  - add more than 32 feature bits.
  - individually disable queues.
  - sanely support cross-endian systems.
  - support very small (1 PAGE) for virtio rings.
  - support a separate page for each vq kick.
  - make each device place config at flexible offset.
 
 None of these things are holding us back today.

All of them do, to bigger or lesser degree.


 I'm not saying we shouldn't introduce a new device.  But adoption of
 that device will be slow and realistically will be limited to new
 devices only.
 
 We'll be supporting both devices for a very, very long time.

This is true for any new feature. What are you trying to say here?
We won't add new features to old config: for once, we have
run out of feature bits.

 Compatibility is the fundamental value that we provide.  We need to go
 out of our way to make sure that existing guests work and work as well
 as possible.

What are you trying to say?
There's nothing here that breaks compatibility.
Have you looked at the patch?

I'm wasting my time arguing on the mailing list, but
once I tear myself away from this occupation,
I intend to verify that I can run an old guest on
qemu with this patch without issues.

 Sticking virtio devices behind a PCI-e bus just for the hell of it isn't
 a compelling reason to break existing guests.
 
 Regards,
 
 Anthony Liguori

That's why my patch does not break existing guests.


 
  Addressing any one of these would cause us to add a substantially new
  way to operate virtio devices.
 
  And since it's a guest change anyway, it seemed like a
  good time to do the new layout and fix everything in one go.
 
  And they are needed like yesterday.
 
 
  So we're stuck with the 1.0 config layout for a very long time.
  
  Regards,
  
  Anthony Liguori
 
  Absolutely. This patch let us support both which will allow for
  a gradual transition over the next 10 years or so.
 
   reason.  I suggest that's 2.0 material...
  
   Cheers,
   Rusty.
  
   --
   To unsubscribe from this list: send the line unsubscribe kvm in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 16:30, Michael S. Tsirkin ha scritto:
  There are other responses in the thread here and I don't really care to
  bikeshed on this issue.
 Great. Let's make the bikeshed blue then?

Yes, let's make it blue, but please promise to check into Peter's
register API so we can remove the case offsetof.

I checked that it works on RHEL5, which is 4.1 and probably the oldest
compiler we care about (any other 4.1 lacks the __sync builtins;
upstream added them in 4.2).

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 04:32:31PM +0200, Paolo Bonzini wrote:
 Il 29/05/2013 16:30, Michael S. Tsirkin ha scritto:
   There are other responses in the thread here and I don't really care to
   bikeshed on this issue.
  Great. Let's make the bikeshed blue then?
 
 Yes, let's make it blue, but please promise to check into Peter's
 register API so we can remove the case offsetof.

I think it's overkill here.  virtio was designed to be easy to implement
with a simple switch statement.

 I checked that it works on RHEL5, which is 4.1 and probably the oldest
 compiler we care about (any other 4.1 lacks the __sync builtins;
 upstream added them in 4.2).
 
 Paolo

Are you sure? Documentation says 4.0 ...

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  I'm guessing any compiler that decides to waste memory in this way
  will quickly get dropped by users and then we won't worry
  about building QEMU with it.
 
 There are other responses in the thread here and I don't really care to
 bikeshed on this issue.

 Great. Let's make the bikeshed blue then?

It's fun to argue about stuff like this and I certainly have an opinion,
but I honestly don't care all that much about the offsetof thing.
However...



  Well, given that virtio is widely deployed today, I would think the 1.0
  standard should strictly reflect what's deployed today, no?
  Any new config layout would be 2.0 material, right?
 
  Not as it's currently planned. Devices can choose
  to support a legacy layout in addition to the new one,
  and if you look at the patch you will see that that
  is exactly what it does.
 
 Adding a new BAR most certainly requires bumping the revision ID or
 changing the device ID, no?

 No, why would it?

If we change the programming interface for a device in a way that is
incompatible, we are required to change the revision ID and/or device
ID.

 If a device dropped BAR0, that would be a good reason
 to bump revision ID.
 We don't do this yet.

But we have to drop BAR0 to put it behind a PCI express bus, right?

If that's the case, then device that's exposed on the PCI express bus
must use a different device ID and/or revision ID.

That means a new driver is needed in the guest.

 Didn't we run into this problem with the virtio-win drivers with just
 the BAR size changing? 

 Because they had a bug: they validated BAR0 size. AFAIK they don't care
 what happens with other bars.

I think there's a grey area with respect to the assumptions a device can
make about the programming interface.

But very concretely, we cannot expose virtio-pci-net via PCI express
with BAR0 disabled because that will result in existing virtio-pci Linux
drivers breaking.

 Not we. The BIOS can disable IO BAR: it can do this already
 but the device won't be functional.

But the only way to expose the device over PCI express is to disable the
IO BAR, right?

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 09:55:55AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
   I'm guessing any compiler that decides to waste memory in this way
   will quickly get dropped by users and then we won't worry
   about building QEMU with it.
  
  There are other responses in the thread here and I don't really care to
  bikeshed on this issue.
 
  Great. Let's make the bikeshed blue then?
 
 It's fun to argue about stuff like this and I certainly have an opinion,
 but I honestly don't care all that much about the offsetof thing.
 However...
 
 
 
   Well, given that virtio is widely deployed today, I would think the 1.0
   standard should strictly reflect what's deployed today, no?
   Any new config layout would be 2.0 material, right?
  
   Not as it's currently planned. Devices can choose
   to support a legacy layout in addition to the new one,
   and if you look at the patch you will see that that
   is exactly what it does.
  
  Adding a new BAR most certainly requires bumping the revision ID or
  changing the device ID, no?
 
  No, why would it?
 
 If we change the programming interface for a device in a way that is
 incompatible, we are required to change the revision ID and/or device
 ID.

It's compatible.


  If a device dropped BAR0, that would be a good reason
  to bump revision ID.
  We don't do this yet.
 
 But we have to drop BAR0 to put it behind a PCI express bus, right?

No, no.

The PCIe spec states that failure to allocate an I/O BAR should still
allow the device to function.

That's *guest* allocating an I/O BAR.


 If that's the case, then device that's exposed on the PCI express bus
 must use a different device ID and/or revision ID.
 
 That means a new driver is needed in the guest.
 
  Didn't we run into this problem with the virtio-win drivers with just
  the BAR size changing? 
 
  Because they had a bug: they validated BAR0 size. AFAIK they don't care
  what happens with other bars.
 
 I think there's a grey area with respect to the assumptions a device can
 make about the programming interface.
 
 But very concretely, we cannot expose virtio-pci-net via PCI express
 with BAR0 disabled because that will result in existing virtio-pci Linux
 drivers breaking.
 
  Not we. The BIOS can disable IO BAR: it can do this already
  but the device won't be functional.
 
 But the only way to expose the device over PCI express is to disable the
 IO BAR, right?
 
 Regards,
 
 Anthony Liguori

No :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 09:55:55AM -0500, Anthony Liguori wrote:
  Not we. The BIOS can disable IO BAR: it can do this already
  but the device won't be functional.
 
 But the only way to expose the device over PCI express is to disable the
 IO BAR, right?

I think this is the source of the misunderstanding:

1.3.2.2.  PCI Express Endpoint Rules

...

PCI Express Endpoint must not depend on operating system
allocation of I/O resources claimed through BAR(s).

The real meaning here is *not* that an Endpoint should not
have an I/O BAR. An Endpoint can have an I/O BAR,
and PCI Express spec supports I/O transactions.
The meaning is that an Endpoint should work
even if *the OS* decides to disable the I/O BAR.
Note: it's up to the guest. We support I/O
as a device with full compatibility for old guests,
but must be prepared to handle a guest which does not
enable I/O.

This likely means that as time goes on,
future OSes will start disabling I/O BARs,
relying on this not hurting functionality.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Rusty Russell
Anthony Liguori aligu...@us.ibm.com writes:
 Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 Michael S. Tsirkin m...@redhat.com writes:
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.

 It is pretty ugly.

 I think beauty is in the eye of the beholder here...

 Pretty much every device we have has a switch statement like this.
 Consistency wins when it comes to qualitative arguments like this.

I was agreeing with you here, actually.

 Yet the structure definitions are descriptive, capturing layout, size
 and endianness in natural a format readable by any C programmer.

From an API design point of view, here are the problems I see:

 1) C makes no guarantees about structure layout beyond the first
member.  Yes, if it's naturally aligned or has a packed attribute,
GCC does the right thing.  But this isn't kernel land anymore,
portability matters and there are more compilers than GCC.

[ I argued in detail here, then deleted.  Damn bikeshedding... ]

I think the best thing is to add the decoded integer versions as macros,
and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
match.

 3) It suspect it's harder to review because a subtle change could more
easily have broad impact.  If someone changed the type of a field
from u32 to u16, it changes the offset of every other field.  That's
not terribly obvious in the patch itself unless you understand how
the structure is used elsewhere.

MST's patch just did this, so point taken.  (MST: I'm going to combine
the cfg_type and bar bytes to fix this, patch coming).

 No change, but there's an open question on whether we should nail it to
 little endian (or define the endian by the transport).

 Of course, I can't rule out that the 1.0 standard *may* decide to frob
 the ring layout somehow,

 Well, given that virtio is widely deployed today, I would think the 1.0
 standard should strictly reflect what's deployed today, no?

That will be up to the committee.  I think we want to fix some obvious
pain points, though qemu will not benefit from them in the next 5 years.

 Any new config layout would be 2.0 material, right?

 Re: the new config layout, I don't think we would want to use it for
 anything but new devices.

There are enough concrete reasons that I think we want it for existing
devices:

1) Negotiated ring size/alignment.  Coreboot wants smaller, others want
   larger.
2) Remove assertion that it has to be an I/O bar.  PowerPC wants this.
3) Notification location flexibility.  MST wanted this for performance.
4) More feature bits.

 Forcing a guest driver change is a really big
 deal and I see no reason to do that unless there's a compelling reason
 to.

 So we're stuck with the 1.0 config layout for a very long time.

We definitely must not force a guest change.  The explicit aim of the
standard is that legacy and 1.0 be backward compatible.  One
deliverable is a document detailing how this is done (effectively a
summary of changes between what we have and 1.0).

It's a delicate balancing act.  My plan is to accompany any changes in
the standard with a qemu implementation, so we can see how painful those
changes are.  And if there are performance implications, measure them.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote:
 Anthony Liguori aligu...@us.ibm.com writes:
  Rusty Russell ru...@rustcorp.com.au writes:
 
  Anthony Liguori aligu...@us.ibm.com writes:
  Michael S. Tsirkin m...@redhat.com writes:
  +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
  +return proxy-device_feature_select;
 
  Oh dear no...  Please use defines like the rest of QEMU.
 
  It is pretty ugly.
 
  I think beauty is in the eye of the beholder here...
 
  Pretty much every device we have has a switch statement like this.
  Consistency wins when it comes to qualitative arguments like this.
 
 I was agreeing with you here, actually.
 
  Yet the structure definitions are descriptive, capturing layout, size
  and endianness in natural a format readable by any C programmer.
 
 From an API design point of view, here are the problems I see:
 
  1) C makes no guarantees about structure layout beyond the first
 member.  Yes, if it's naturally aligned or has a packed attribute,
 GCC does the right thing.  But this isn't kernel land anymore,
 portability matters and there are more compilers than GCC.
 
 [ I argued in detail here, then deleted.  Damn bikeshedding... ]
 
 I think the best thing is to add the decoded integer versions as macros,
 and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
 match.

On the qemu side, fine with me, all I want is ability to stay
close to kernel headers.
Let's have XXX_SIZE macros as well, so access size can be
easily validated?

On the kernel side, for 'struct virtio_pci_cap',
since we only ever do offsetof on this struct,
it might be easier to just use the numeric constants
directly.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Michael S. Tsirkin
This adds support for new config, and is designed to work with
the new layout code in Rusty's new layout branch.

At the moment all fields are in the same memory BAR (bar 2).
This will be used to test performance and compare
memory, io and hypercall latency.

Compiles but does not work yet.
Migration isn't handled yet.

It's not clear what do queue_enable/queue_disable
fields do, not yet implemented.

Gateway for config access with config cycles
not yet implemented.

Sending out for early review/flames.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio/virtio-pci.c | 393 +++--
 hw/virtio/virtio-pci.h |  55 +++
 hw/virtio/virtio.c |  20 +++
 include/hw/virtio/virtio.h |   4 +
 4 files changed, 458 insertions(+), 14 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 752991a..f4db224 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy 
*proxy)
 proxy-ioeventfd_started = false;
 }
 
+static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val)
+{
+VirtIODevice *vdev = proxy-vdev;
+
+if (!(val  VIRTIO_CONFIG_S_DRIVER_OK)) {
+virtio_pci_stop_ioeventfd(proxy);
+}
+
+virtio_set_status(vdev, val  0xFF);
+
+if (val  VIRTIO_CONFIG_S_DRIVER_OK) {
+virtio_pci_start_ioeventfd(proxy);
+}
+
+if (vdev-status == 0) {
+virtio_reset(proxy-vdev);
+msix_unuse_all_vectors(proxy-pci_dev);
+}
+}
+
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 }
 break;
 case VIRTIO_PCI_STATUS:
-if (!(val  VIRTIO_CONFIG_S_DRIVER_OK)) {
-virtio_pci_stop_ioeventfd(proxy);
-}
-
-virtio_set_status(vdev, val  0xFF);
-
-if (val  VIRTIO_CONFIG_S_DRIVER_OK) {
-virtio_pci_start_ioeventfd(proxy);
-}
-
-if (vdev-status == 0) {
-virtio_reset(proxy-vdev);
-msix_unuse_all_vectors(proxy-pci_dev);
-}
+virtio_pci_set_status(proxy, val);
 
 /* Linux before 2.6.34 sets the device as OK without enabling
the PCI device bus master bit. In this case we need to disable
@@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, hwaddr 
addr,
 }
 }
 
+static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
+  unsigned size)
+{
+VirtIOPCIProxy *proxy = opaque;
+VirtIODevice *vdev = proxy-vdev;
+
+uint64_t low = 0xull;
+
+switch (addr) {
+case offsetof(struct virtio_pci_common_cfg, device_feature_select):
+return proxy-device_feature_select;
+case offsetof(struct virtio_pci_common_cfg, device_feature):
+/* TODO: 64-bit features */
+   return proxy-device_feature_select ? 0 : proxy-host_features;
+case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
+return proxy-guest_feature_select;
+case offsetof(struct virtio_pci_common_cfg, guest_feature):
+/* TODO: 64-bit features */
+   return proxy-guest_feature_select ? 0 : vdev-guest_features;
+case offsetof(struct virtio_pci_common_cfg, msix_config):
+   return vdev-config_vector;
+case offsetof(struct virtio_pci_common_cfg, num_queues):
+/* TODO: more exact limit? */
+   return VIRTIO_PCI_QUEUE_MAX;
+case offsetof(struct virtio_pci_common_cfg, device_status):
+return vdev-status;
+
+   /* About a specific virtqueue. */
+case offsetof(struct virtio_pci_common_cfg, queue_select):
+return  vdev-queue_sel;
+case offsetof(struct virtio_pci_common_cfg, queue_size):
+return virtio_queue_get_num(vdev, vdev-queue_sel);
+case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
+   return virtio_queue_vector(vdev, vdev-queue_sel);
+case offsetof(struct virtio_pci_common_cfg, queue_enable):
+/* TODO */
+   return 0;
+case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
+return vdev-queue_sel;
+case offsetof(struct virtio_pci_common_cfg, queue_desc):
+return virtio_queue_get_desc_addr(vdev, vdev-queue_sel)  low;
+case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
+return virtio_queue_get_desc_addr(vdev, vdev-queue_sel)  32;
+case offsetof(struct virtio_pci_common_cfg, queue_avail):
+return virtio_queue_get_avail_addr(vdev, vdev-queue_sel)  low;
+case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
+return virtio_queue_get_avail_addr(vdev, vdev-queue_sel)  32;
+case offsetof(struct virtio_pci_common_cfg, queue_used):
+return virtio_queue_get_used_addr(vdev, vdev-queue_sel)  low;
+case offsetof(struct 

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 This adds support for new config, and is designed to work with
 the new layout code in Rusty's new layout branch.

 At the moment all fields are in the same memory BAR (bar 2).
 This will be used to test performance and compare
 memory, io and hypercall latency.

 Compiles but does not work yet.
 Migration isn't handled yet.

 It's not clear what do queue_enable/queue_disable
 fields do, not yet implemented.

 Gateway for config access with config cycles
 not yet implemented.

 Sending out for early review/flames.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/virtio/virtio-pci.c | 393 
 +++--
  hw/virtio/virtio-pci.h |  55 +++
  hw/virtio/virtio.c |  20 +++
  include/hw/virtio/virtio.h |   4 +
  4 files changed, 458 insertions(+), 14 deletions(-)

 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index 752991a..f4db224 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy 
 *proxy)
  proxy-ioeventfd_started = false;
  }
  
 +static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val)
 +{
 +VirtIODevice *vdev = proxy-vdev;
 +
 +if (!(val  VIRTIO_CONFIG_S_DRIVER_OK)) {
 +virtio_pci_stop_ioeventfd(proxy);
 +}
 +
 +virtio_set_status(vdev, val  0xFF);
 +
 +if (val  VIRTIO_CONFIG_S_DRIVER_OK) {
 +virtio_pci_start_ioeventfd(proxy);
 +}
 +
 +if (vdev-status == 0) {
 +virtio_reset(proxy-vdev);
 +msix_unuse_all_vectors(proxy-pci_dev);
 +}
 +}
 +
  static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
 @@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
 addr, uint32_t val)
  }
  break;
  case VIRTIO_PCI_STATUS:
 -if (!(val  VIRTIO_CONFIG_S_DRIVER_OK)) {
 -virtio_pci_stop_ioeventfd(proxy);
 -}
 -
 -virtio_set_status(vdev, val  0xFF);
 -
 -if (val  VIRTIO_CONFIG_S_DRIVER_OK) {
 -virtio_pci_start_ioeventfd(proxy);
 -}
 -
 -if (vdev-status == 0) {
 -virtio_reset(proxy-vdev);
 -msix_unuse_all_vectors(proxy-pci_dev);
 -}
 +virtio_pci_set_status(proxy, val);
  
  /* Linux before 2.6.34 sets the device as OK without enabling
 the PCI device bus master bit. In this case we need to disable
 @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
 hwaddr addr,
  }
  }
  
 +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
 +  unsigned size)
 +{
 +VirtIOPCIProxy *proxy = opaque;
 +VirtIODevice *vdev = proxy-vdev;
 +
 +uint64_t low = 0xull;
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

Oh dear no...  Please use defines like the rest of QEMU.

From a QEMU pov, take a look at:

https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659

And:

https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb

Which lets virtio-pci be subclassable and then remaps the config space to
BAR2.

Haven't looked at the proposed new ring layout yet.

Regards,

Anthony Liguori

 +case offsetof(struct virtio_pci_common_cfg, device_feature):
 +/* TODO: 64-bit features */
 + return proxy-device_feature_select ? 0 : proxy-host_features;
 +case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
 +return proxy-guest_feature_select;
 +case offsetof(struct virtio_pci_common_cfg, guest_feature):
 +/* TODO: 64-bit features */
 + return proxy-guest_feature_select ? 0 : vdev-guest_features;
 +case offsetof(struct virtio_pci_common_cfg, msix_config):
 + return vdev-config_vector;
 +case offsetof(struct virtio_pci_common_cfg, num_queues):
 +/* TODO: more exact limit? */
 + return VIRTIO_PCI_QUEUE_MAX;
 +case offsetof(struct virtio_pci_common_cfg, device_status):
 +return vdev-status;
 +
 + /* About a specific virtqueue. */
 +case offsetof(struct virtio_pci_common_cfg, queue_select):
 +return  vdev-queue_sel;
 +case offsetof(struct virtio_pci_common_cfg, queue_size):
 +return virtio_queue_get_num(vdev, vdev-queue_sel);
 +case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
 + return virtio_queue_vector(vdev, vdev-queue_sel);
 +case offsetof(struct virtio_pci_common_cfg, queue_enable):
 +/* TODO */
 + return 0;
 +case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
 +return vdev-queue_sel;
 +case offsetof(struct virtio_pci_common_cfg, queue_desc):
 +return virtio_queue_get_desc_addr(vdev, 

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Michael S. Tsirkin
On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
  @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
  hwaddr addr,
   }
   }
   
  +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
  +  unsigned size)
  +{
  +VirtIOPCIProxy *proxy = opaque;
  +VirtIODevice *vdev = proxy-vdev;
  +
  +uint64_t low = 0xull;
  +
  +switch (addr) {
  +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
  +return proxy-device_feature_select;
 
 Oh dear no...  Please use defines like the rest of QEMU.

Any good reason not to use offsetof?
I see about 138 examples in qemu.

Anyway, that's the way Rusty wrote it in the kernel header -
I started with defines.
If you convince Rusty to switch I can switch too,
but I prefer reusing same structures as linux even if
for now I've given up on reusing same headers.


 From a QEMU pov, take a look at:
 
 https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
 
 And:
 
 https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
 
 Which lets virtio-pci be subclassable and then remaps the config space to
 BAR2.


Interesting. Have the spec anywhere?
You are saying this is going to conflict because
of BAR2 usage, yes?

So let's only do this virtio-fb only for new layout, so we don't need
to maintain compatibility. In particular, we are working
on making memory BAR access fast for virtio devices
in a generic way. At the moment they are measureably slower
than PIO on x86.


 Haven't looked at the proposed new ring layout yet.
 
 Regards,

No new ring layout. It's new config layout.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Paolo Bonzini
Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
   +
   +switch (addr) {
   +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
   +return proxy-device_feature_select;
  
  Oh dear no...  Please use defines like the rest of QEMU.
 Any good reason not to use offsetof?

I'm not sure it's portable to use it in case labels.  IIRC, the
definition ((int)(((T *)0)-field)) is not a valid C integer constant
expression.  Laszlo?

 I see about 138 examples in qemu.

Almost all of them are about fields in the host, not the guest, and none
of them are in case statements.

Paolo

 
 Anyway, that's the way Rusty wrote it in the kernel header -
 I started with defines.
 If you convince Rusty to switch I can switch too,
 but I prefer reusing same structures as linux even if
 for now I've given up on reusing same headers.
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
  @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
  hwaddr addr,
   }
   }
   
  +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
  +  unsigned size)
  +{
  +VirtIOPCIProxy *proxy = opaque;
  +VirtIODevice *vdev = proxy-vdev;
  +
  +uint64_t low = 0xull;
  +
  +switch (addr) {
  +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
  +return proxy-device_feature_select;
 
 Oh dear no...  Please use defines like the rest of QEMU.

 Any good reason not to use offsetof?
 I see about 138 examples in qemu.

There are exactly zero:

$ find . -name *.c -exec grep -l case offset {} \;
$

 Anyway, that's the way Rusty wrote it in the kernel header -
 I started with defines.
 If you convince Rusty to switch I can switch too,

We have 300+ devices in QEMU that use #defines.  We're not using this
kind of thing just because you want to copy code from the kernel.

 https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
 
 And:
 
 https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
 
 Which lets virtio-pci be subclassable and then remaps the config space to
 BAR2.


 Interesting. Have the spec anywhere?

Not yet, but working on that.

 You are saying this is going to conflict because
 of BAR2 usage, yes?

No, this whole thing is flexible.  I had to use BAR2 because BAR0 has to
be the vram mapping.  It also had to be an MMIO bar.

The new layout might make it easier to implement a device like this.  I
shared it mainly because I wanted to show the subclassing idea vs. just
tacking an option onto the existing virtio-pci code in QEMU.

Regards,

Anthony Liguori

 So let's only do this virtio-fb only for new layout, so we don't need
 to maintain compatibility. In particular, we are working
 on making memory BAR access fast for virtio devices
 in a generic way. At the moment they are measureably slower
 than PIO on x86.


 Haven't looked at the proposed new ring layout yet.
 
 Regards,

 No new ring layout. It's new config layout.


 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Michael S. Tsirkin
On Tue, May 28, 2013 at 01:53:35PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
   @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
   hwaddr addr,
}
}

   +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
   +  unsigned size)
   +{
   +VirtIOPCIProxy *proxy = opaque;
   +VirtIODevice *vdev = proxy-vdev;
   +
   +uint64_t low = 0xull;
   +
   +switch (addr) {
   +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
   +return proxy-device_feature_select;
  
  Oh dear no...  Please use defines like the rest of QEMU.
 
  Any good reason not to use offsetof?
  I see about 138 examples in qemu.
 
 There are exactly zero:
 
 $ find . -name *.c -exec grep -l case offset {} \;
 $
 
  Anyway, that's the way Rusty wrote it in the kernel header -
  I started with defines.
  If you convince Rusty to switch I can switch too,
 
 We have 300+ devices in QEMU that use #defines.  We're not using this
 kind of thing just because you want to copy code from the kernel.

Rusty, let's switch to defines?

  https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
  
  And:
  
  https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
  
  Which lets virtio-pci be subclassable and then remaps the config space to
  BAR2.
 
 
  Interesting. Have the spec anywhere?
 
 Not yet, but working on that.
 
  You are saying this is going to conflict because
  of BAR2 usage, yes?
 
 No, this whole thing is flexible.  I had to use BAR2 because BAR0 has to
 be the vram mapping.  It also had to be an MMIO bar.
 
 The new layout might make it easier to implement a device like this.

Absolutely, you can put thing anywhere you like.

  I
 shared it mainly because I wanted to show the subclassing idea vs. just
 tacking an option onto the existing virtio-pci code in QEMU.
 
 Regards,
 
 Anthony Liguori

I don't think a subclass will work for the new layout. This should be
completely transparent to users, just have more devices
work with more flexibility.

  So let's only do this virtio-fb only for new layout, so we don't need
  to maintain compatibility. In particular, we are working
  on making memory BAR access fast for virtio devices
  in a generic way. At the moment they are measureably slower
  than PIO on x86.
 
 
  Haven't looked at the proposed new ring layout yet.
  
  Regards,
 
  No new ring layout. It's new config layout.
 
 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Laszlo Ersek
On 05/28/13 19:43, Paolo Bonzini wrote:
 Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.
 Any good reason not to use offsetof?
 
 I'm not sure it's portable to use it in case labels.  IIRC, the
 definition ((int)(((T *)0)-field)) is not a valid C integer constant
 expression.  Laszlo?

As long as we use the offsetof() that comes with the C implementation
(ie. we don't hack it up ourselves), we should be safe; ISO C99
elegantly defines offsetof() in 7.17 Common definitions stddef.h p3
as

  The macros are [...]

offsetof(type, member-designator)

  which expands to an integer constant expression that has type
  *size_t*, the value of which is the offset in bytes, to the structure
  member (designated by /member-designator/), from the beginning of its
  structure (designated by /type/). The type and member designator shall
  be such that given

static type t;

  then the expression (t.member-designator) evaluates to an address
  constant. (If the specified member is a bit-field, the behavior is
  undefined.)

(Address constant is described in 6.6p9 but quoting even that here
doesn't seem necesary.)

From 6.8.4.2p3, The expression of each case label shall be an integer
constant expression [...].

So,
(a) if we use the standard offsetof(),
(b) and you can also write, for example,

  static struct virtio_pci_common_cfg t;
  static void *p = t.device_feature_select;

then the construct seems valid.

(Consistently with the above mention of UB if the specified member is a
bit-field: the address-of operator's constraints also forbid a bit-field
object as operand.)

Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Rusty Russell
Anthony Liguori aligu...@us.ibm.com writes:
 Michael S. Tsirkin m...@redhat.com writes:
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.

It is pretty ugly.

Yet the structure definitions are descriptive, capturing layout, size
and endianness in natural a format readable by any C programmer.

So AFAICT the question is, do we put the required

#define VIRTIO_PCI_CFG_FEATURE_SEL \
 (offsetof(struct virtio_pci_common_cfg, device_feature_select))

etc. in the kernel headers or qemu?

 Haven't looked at the proposed new ring layout yet.

No change, but there's an open question on whether we should nail it to
little endian (or define the endian by the transport).

Of course, I can't rule out that the 1.0 standard *may* decide to frob
the ring layout somehow, but I'd think it would require a compelling
reason.  I suggest that's 2.0 material...

Cheers,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Rusty Russell
Paolo Bonzini pbonz...@redhat.com writes:
 Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
   +
   +switch (addr) {
   +case offsetof(struct virtio_pci_common_cfg, 
   device_feature_select):
   +return proxy-device_feature_select;
  
  Oh dear no...  Please use defines like the rest of QEMU.
 Any good reason not to use offsetof?

 I'm not sure it's portable to use it in case labels.  IIRC, the
 definition ((int)(((T *)0)-field)) is not a valid C integer constant
 expression.  Laszlo?

It's defined to yield an integer constant expression in the ISO standard
(and I think ANSI too, though that's not at hand):

7.19, para 3:

...offsetof(type, member-designator)
which expands to an integer constant expression that has type
size_t, ...

The real question is whether compilers qemu cares about meet the
standard (there's some evidence that older compilers fail this).  If
not, we'll have to define them as raw offsets...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html