On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivity <a...@redhat.com> wrote:
> On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote:
>> On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity <a...@redhat.com> wrote:
>> > On 03/13/2012 12:42 PM, Amos Kong wrote:
>> >> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
>> >> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
>> >> and check if available ioeventfd exists. If not, virtio-pci will
>> >> fallback to userspace, and don't use ioeventfd for io notification.
>> >
>> > How about an alternative way of solving this, within the memory core:
>> > trap those writes in qemu and write to the ioeventfd yourself.  This way
>> > ioeventfds work even without kvm:
>> >
>> >
>> >  core: create eventfd
>> >  core: install handler for memory address that writes to ioeventfd
>> >  kvm (optional): install kernel handler for ioeventfd
>> >
>> > even if the third step fails, the ioeventfd still works, it's just slower.
>>
>> That approach will penalize guests with large numbers of disks - they
>> see an extra switch to vcpu thread instead of kvm.ko -> iothread.
>
> It's only a failure path.  The normal path is expected to have a kvm
> ioeventfd installed.

It's the normal path when you attach >232 virtio-blk devices to a
guest (or 300 in the future).

>>   It
>> seems okay provided we can solve the limit in the kernel once and for
>> all by introducing a more dynamic data structure for in-kernel
>> devices.  That way future kernels will never hit an arbitrary limit
>> below their file descriptor rlimit.
>>
>> Is there some reason why kvm.ko must use a fixed size array?  Would it
>> be possible to use a tree (maybe with a cache for recent lookups)?
>
> It does use bsearch today IIRC.  We'll expand the limit, but there must
> be a limit, and qemu must be prepared to deal with it.

Shouldn't the limit be the file descriptor rlimit?  If userspace
cannot create more eventfds then it cannot set up more ioeventfds.

I agree there always needs to be an error path because there is a
finite resource (either file descriptors or in-kernel device slots).

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to