Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

Avi Kivity Wed, 11 Apr 2007 07:29:19 -0700

Rusty Russell wrote:

On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote:

Nope.  Being async is critical for copyless networking:


- in the transmit path, so need to stop the sender (guest) from touching
the memory until it's on the wire.  This means 100% of packets sent will
be blocked.


Hi Avi,

        You keep saying stuff like this, and I keep ignoring it.  OK, I'll
bite:

        Why would we try to prevent the sender from altering the packets?


To avoid data corruption.

The guest wants to send a packet. It calls write(), which causes an skbto be allocated, data to be copied into it, the entire networking stackgets into gear, and the guest-side driver instructs the "device" to sendthe packet.

With async operations, the saga continues like this: the host-sidedriver allocates an skb, get_page()s and attaches the data to the newskb, this skb crosses the bridge, trickles into the real ethernetdevice, gets queued there, sent, interrupts fire, triggering asynccompletion. On this completion, we send a virtual interrupt to theguest, which tells it to destroy the skb and reclaim the pages attachedto it.

Without async operations, we don't have a hook to notify the guest whento reclaim the skb. If we do it too soon, the skb can be reclaimed andthe memory reused before the real device gets to see it, so we end upsending data that we did not intend. The only way to avoid it is tocopy the data somewhere safe, but that is exactly what we don't want to do.

- multiple packets per operation (for interrupt mitigation) (like
lio_listio)


The benefits for interrupt mitigation are less clear to me in a virtual
environment (scheduling tends to make it happen anyway); I'd want to
benchmark it.

Yes, the guest will probably submit multiple packets in one hypercall.It would be nice for the userspace driver to be able to submit them tothe host kernel in one syscall.

Some kind of batching to reduce syscall overhead, perhaps, but TSO would
go a fair way towards that anyway (probably not enough).


For some workloads, sure.

- scatter/gather packets (iovecs)


Yes, and this is already present in the tap device.  Anthony suggested a
slightly nasty hack for multiple sg packets in one writev()/readv, which
could also give us batching.


No need for hacks if we get list aio support one day.

- configurable wakeup (by packet count/timeout) for queue management


I'm not convinced that this is a showstopper, though.


It probably isn't.  It's free with aio though.

- hacks (tso)


I'd usually go for a batch interface over TSO, but if the card we're
sending to actually does TSO then TSO will probably win.

Sure, if tso helps a regular host then it should help one that happensto be running a virtual machine.

Most of these can be provided by a combination of the pending aio work,
the pending aio/fd integration, and the not-so-pending tap aio work.  As
the first two are available as patches and the third is limited to the
tap device, it is not unreasonable to try it out.  Maybe it will turn
out not to be as difficult as I predicted just a few lines above.


Indeed, I don't think we're asking for a revolution a-la VJ-style
channels.  But I'm still itching to get back to that, and this might yet
provide an excuse 8)

I'll be happy if this can be made to work. It will make the paravirtguest-side driver work in kvm-less setups, which are useful for testing,and of course reduction in kernel code is beneficial. It will be slowerthat in-kernel, but if we get the batching right, perhaps notsignificantly slower. I'm mostly concerned that this depends on codethat has eluded merging for such a long time.



--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

Reply via email to