Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

Michael S. Tsirkin Wed, 02 Mar 2011 02:06:41 -0800

On Tue, Mar 01, 2011 at 09:32:56PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <[email protected]> wrote on 02/28/2011 03:13:20 PM:
> 
> Thank you once again for your feedback on both these patches.
> I will send the qemu patch tomorrow. I will also send the next
> version incorporating these suggestions once we finalize some
> minor points.
> 
> > Overall looks good.
> > The numtxqs meaning the number of rx queues needs some cleanup.
> > init/cleanup routines need more symmetry.
> > Error handling on setup also seems slightly buggy or at least
> asymmetrical.
> > Finally, this will use up a large number of MSI vectors,
> > while TX interrupts mostly stay unused.
> >
> > Some comments below.
> >
> > > +/* Maximum number of individual RX/TX queues supported */
> > > +#define VIRTIO_MAX_TXQS                           16
> > > +
> >
> > This also does not seem to belong in the header.
> 
> Both virtio-net and vhost need some check to make sure very
> high values are not passed by userspace. Is this not required?


Whatever we stick in the header is effectively part of
host/gues interface. Are you sure we'll never want
more than 16 VQs? This value does not seem that high.

> > > +#define VIRTIO_NET_F_NUMTXQS              21              /* Device 
> > > supports multiple
> TX queue */
> >
> > VIRTIO_NET_F_MULTIQUEUE ?
> 
> Yes, that's a better name.
> 
> > > @@ -34,6 +38,8 @@ struct virtio_net_config {
> > >            __u8 mac[6];
> > >            /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> > >            __u16 status;
> > > +          /* number of RX/TX queues */
> > > +          __u16 numtxqs;
> >
> > The interface here is a bit ugly:
> > - this is really both # of tx and rx queues but called numtxqs
> > - there's a hardcoded max value
> > - 0 is assumed to be same as 1
> > - assumptions above are undocumented.
> >
> > One way to address this could be num_queue_pairs, and something like
> >              /* The actual number of TX and RX queues is num_queue_pairs +
> 1 each. */
> >              __u16 num_queue_pairs;
> > (and tweak code to match).
> >
> > Alternatively, have separate registers for the number of tx and rx
> queues.
> 
> OK, so virtio_net_config has num_queue_pairs, and this gets converted to
> numtxqs in virtnet_info?

Or put num_queue_pairs in virtnet_info too.

> > > +struct virtnet_info {
> > > +          struct send_queue **sq;
> > > +          struct receive_queue **rq;
> > > +
> > > +          /* read-mostly variables */
> > > +          int numtxqs ____cacheline_aligned_in_smp;
> >
> > Why do you think this alignment is a win?
> 
> Actually this code was from the earlier patchset (MQ TX only) where
> the layout was different. Now rq and sq are allocated as follows:
>       vi->sq = kzalloc(numtxqs * sizeof(*vi->sq), GFP_KERNEL);
>       for (i = 0; i < numtxqs; i++) {
>               vi->sq[i] = kzalloc(sizeof(*vi->sq[i]), GFP_KERNEL);
> Since the two pointers becomes read-only during use, there is no cache
> line dirty'ing.  I will remove this directive.
> 
> > > +/*
> > > + * Note for 'qnum' below:
> > > + *                first 'numtxqs' vqs are RX, next 'numtxqs' vqs are TX.
> > > + */
> >
> > Another option to consider is to have them RX,TX,RX,TX:
> > this way vq->queue_index / 2 gives you the
> > queue pair number, no need to read numtxqs. On the other hand, it makes
> the
> > #RX==#TX assumption even more entrenched.
> 
> OK. I was following how many drivers were allocating RX and TX's
> together - eg ixgbe_adapter has tx_ring and rx_ring arrays; bnx2
> has rx_buf_ring and tx_buf_ring arrays, etc.

That's fine. I am only talking about the VQ numbers.

> Also, vhost has some
> code that processes tx first before rx (e.g. vhost_net_stop/flush),

No idea why did I do it this way. I don't think it matters.

> so this approach seemed helpful.
> I am OK either way, what do you
> suggest?

We get less code generated but also less flexibility.
I am not sure, I'll play around with code, for now
let's keep it as is.


> > > +          err = vi->vdev->config->find_vqs(vi->vdev, totalvqs, vqs,
> callbacks,
> > > +                                                                         
> > >   (const char
> **)names);
> > > +          if (err)
> > > +                          goto free_params;
> > > +
> >
> > This would use up quite a lot of vectors. However,
> > tx interrupt is, in fact, slow path. So, assuming we don't have
> > enough vectors to use per vq, I think it's a good idea to
> > support reducing MSI vector usage by mapping all TX VQs to the same
> vector
> > and separate vectors for RX.
> > The hypervisor actually allows this, but we don't have an API at the
> virtio
> > level to pass that info to virtio pci ATM.
> > Any idea what a good API to use would be?
> 
> Yes, it is a waste to have these vectors for tx ints. I initially
> thought of adding a flag to virtio_device to pass to vp_find_vqs,
> but it won't work, so a new API is needed. I can work with you on
> this in the background if you like.


OK. For starters, how about we change find_vqs to get a structure?  Then
we can easily add flags that tell us that some interrupts are rare.


diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 800617b..2b765bb 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -78,7 +78,14 @@
  *     This gives the final feature bits for the device: it can change
  *     the dev->feature bits if it wants.
  */
+
 typedef void vq_callback_t(struct virtqueue *);
+struct virtqueue_info {
+       struct virtqueue *vq;
+       vq_callback_t *callback;
+       const char *name;
+};
+
 struct virtio_config_ops {
        void (*get)(struct virtio_device *vdev, unsigned offset,
                    void *buf, unsigned len);
@@ -88,9 +95,7 @@ struct virtio_config_ops {
        void (*set_status)(struct virtio_device *vdev, u8 status);
        void (*reset)(struct virtio_device *vdev);
        int (*find_vqs)(struct virtio_device *, unsigned nvqs,
-                       struct virtqueue *vqs[],
-                       vq_callback_t *callbacks[],
-                       const char *names[]);
+                       struct virtqueue_info vq_info[]);
        void (*del_vqs)(struct virtio_device *);
        u32 (*get_features)(struct virtio_device *vdev);
        void (*finalize_features)(struct virtio_device *vdev);


> > > +          for (i = 0; i < numtxqs; i++) {
> > > +                          vi->rq[i]->rvq = vqs[i];
> > > +                          vi->sq[i]->svq = vqs[i + numtxqs];
> >
> > This logic is spread all over. We need some kind of macro to
> > get queue number of vq number and back.
> 
> Will add this.
> 
> > > +          if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
> > > +                          vi->cvq = vqs[i + numtxqs];
> > > +
> > > +                          if (virtio_has_feature(vi->vdev,
> VIRTIO_NET_F_CTRL_VLAN))
> > > +                                          vi->dev->features |=
> NETIF_F_HW_VLAN_FILTER;
> >
> > This bit does not seem to belong in initialize_vqs.
> 
> I will move it back to probe.
> 
> > > +          err = virtio_config_val(vdev, VIRTIO_NET_F_NUMTXQS,
> > > +                                                          offsetof(struct
> virtio_net_config, numtxqs),
> > > +                                                          &numtxqs);
> > > +
> > > +          /* We need atleast one txq */
> > > +          if (err || !numtxqs)
> > > +                          numtxqs = 1;
> >
> > err is okay, but should we just fail on illegal values?
> > Or change the semantics:
> >     n = 0;
> >     err = virtio_config_val(vdev, VIRTIO_NET_F_NUMTXQS,
> >                             offsetof(struct virtio_net_config, numtxqs),
> >                             &n);
> >     numtxq = n + 1;
> 
> Will this be better:
>       int num_queue_pairs = 2;
>       int numtxqs;
> 
>       err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE,
>                               offsetof(struct virtio_net_config,
>                                        num_queue_pairs), &num_queue_pairs);
>       <ignore error, if any>
>       numtxqs = num_queue_pairs / 2;
> 
> > > +          if (numtxqs > VIRTIO_MAX_TXQS)
> > > +                          return -EINVAL;
> >
> > Do we strictly need this?
> > I think we should just use whatever hardware has,
> > or alternatively somehow ignore the unused queues
> > (easy for tx, not sure about rx).
> 
> vq's are matched between qemu, virtio-net and vhost. Isn't some check
> required that userspace has not passed a bad value?


For virtio, I'm not too concerned: qemu can already easily
crash the guest :)
For vhost yes, but I'm concerned that even with 16 VQs we are
drinking a lot of resources already. I would be happier
if we had a file descriptor per VQs pair in some way.
The the amount of memory userspace can use up is
limited by the # of file descriptors.

> > > +                          if (vi->rq[i]->num == 0) {
> > > +                                          err = -ENOMEM;
> > > +                                          goto free_recv_bufs;
> > > +                          }
> > >            }
> > If this fails for vq > 0, you have to detach bufs.
> 
> Right, will fix this.
> 
> > >  free_vqs:
> > > +          for (i = 0; i < numtxqs; i++)
> > > +                          cancel_delayed_work_sync(&vi->rq[i]->refill);
> > >            vdev->config->del_vqs(vdev);
> > > -free:
> > > +          free_rq_sq(vi);
> >
> > If we have a wrapper to init all vqs, pls add a wrapper to clean up
> > all vqs as well.
> 
> Will add that.
> 
> > > +          for (i = 0; i < vi->numtxqs; i++) {
> > > +                          struct virtqueue *rvq = vi->rq[i]->rvq;
> > > +
> > > +                          while (1) {
> > > +                                          buf = 
> > > virtqueue_detach_unused_buf
> (rvq);
> > > +                                          if (!buf)
> > > +                                                          break;
> > > +                                          if (vi->mergeable_rx_bufs || 
> > > vi->
> big_packets)
> > > +                                                          
> > > give_pages(vi->rq[i],
> buf);
> > > +                                          else
> > > +                                                          
> > > dev_kfree_skb(buf);
> > > +                                          --vi->rq[i]->num;
> > > +                          }
> > > +                          BUG_ON(vi->rq[i]->num != 0);
> > >            }
> > > -          BUG_ON(vi->num != 0);
> > > +
> > > +          free_rq_sq(vi);
> >
> >
> > This looks wrong here. This function should detach
> > and free all bufs, not internal malloc stuff.
> 
> That is being done by free_receive_buf after free_unused_bufs()
> returns. I hope this addresses your point.
> 
> > I think we should have free_unused_bufs that handles
> > a single queue, and call it in a loop.
> 
> OK, so define free_unused_bufs() as:
> 
> static void free_unused_bufs(struct virtnet_info *vi, struct virtqueue
> *svq,
>                            struct virtqueue *rvq)
> {
>       /* Use svq and rvq with the remaining code unchanged */
> }
> 
> Thanks,
> 
> - KK

Not sure I understand. I am just suggesting
adding symmetrical functions like init/cleanup
alloc/free etc instead of adding stuff in random
functions that just happens to be called at the right time.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

Reply via email to