Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

Krishna Kumar2 Tue, 01 Mar 2011 08:01:58 -0800

"Michael S. Tsirkin" <m...@redhat.com> wrote on 02/28/2011 03:13:20 PM:


Thank you once again for your feedback on both these patches.
I will send the qemu patch tomorrow. I will also send the next
version incorporating these suggestions once we finalize some
minor points.

> Overall looks good.
> The numtxqs meaning the number of rx queues needs some cleanup.
> init/cleanup routines need more symmetry.
> Error handling on setup also seems slightly buggy or at least
asymmetrical.
> Finally, this will use up a large number of MSI vectors,
> while TX interrupts mostly stay unused.
>
> Some comments below.
>
> > +/* Maximum number of individual RX/TX queues supported */
> > +#define VIRTIO_MAX_TXQS                             16
> > +
>
> This also does not seem to belong in the header.

Both virtio-net and vhost need some check to make sure very
high values are not passed by userspace. Is this not required?

> > +#define VIRTIO_NET_F_NUMTXQS                21              /* Device 
> > supports multiple
TX queue */
>
> VIRTIO_NET_F_MULTIQUEUE ?

Yes, that's a better name.

> > @@ -34,6 +38,8 @@ struct virtio_net_config {
> >              __u8 mac[6];
> >              /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> >              __u16 status;
> > +            /* number of RX/TX queues */
> > +            __u16 numtxqs;
>
> The interface here is a bit ugly:
> - this is really both # of tx and rx queues but called numtxqs
> - there's a hardcoded max value
> - 0 is assumed to be same as 1
> - assumptions above are undocumented.
>
> One way to address this could be num_queue_pairs, and something like
>                /* The actual number of TX and RX queues is num_queue_pairs +
1 each. */
>                __u16 num_queue_pairs;
> (and tweak code to match).
>
> Alternatively, have separate registers for the number of tx and rx
queues.

OK, so virtio_net_config has num_queue_pairs, and this gets converted to
numtxqs in virtnet_info?

> > +struct virtnet_info {
> > +            struct send_queue **sq;
> > +            struct receive_queue **rq;
> > +
> > +            /* read-mostly variables */
> > +            int numtxqs ____cacheline_aligned_in_smp;
>
> Why do you think this alignment is a win?

Actually this code was from the earlier patchset (MQ TX only) where
the layout was different. Now rq and sq are allocated as follows:
        vi->sq = kzalloc(numtxqs * sizeof(*vi->sq), GFP_KERNEL);
        for (i = 0; i < numtxqs; i++) {
                vi->sq[i] = kzalloc(sizeof(*vi->sq[i]), GFP_KERNEL);
Since the two pointers becomes read-only during use, there is no cache
line dirty'ing.  I will remove this directive.

> > +/*
> > + * Note for 'qnum' below:
> > + *          first 'numtxqs' vqs are RX, next 'numtxqs' vqs are TX.
> > + */
>
> Another option to consider is to have them RX,TX,RX,TX:
> this way vq->queue_index / 2 gives you the
> queue pair number, no need to read numtxqs. On the other hand, it makes
the
> #RX==#TX assumption even more entrenched.

OK. I was following how many drivers were allocating RX and TX's
together - eg ixgbe_adapter has tx_ring and rx_ring arrays; bnx2
has rx_buf_ring and tx_buf_ring arrays, etc. Also, vhost has some
code that processes tx first before rx (e.g. vhost_net_stop/flush),
so this approach seemed helpful. I am OK either way, what do you
suggest?

> > +            err = vi->vdev->config->find_vqs(vi->vdev, totalvqs, vqs,
callbacks,
> > +                                                                           
> >   (const char
**)names);
> > +            if (err)
> > +                            goto free_params;
> > +
>
> This would use up quite a lot of vectors. However,
> tx interrupt is, in fact, slow path. So, assuming we don't have
> enough vectors to use per vq, I think it's a good idea to
> support reducing MSI vector usage by mapping all TX VQs to the same
vector
> and separate vectors for RX.
> The hypervisor actually allows this, but we don't have an API at the
virtio
> level to pass that info to virtio pci ATM.
> Any idea what a good API to use would be?

Yes, it is a waste to have these vectors for tx ints. I initially
thought of adding a flag to virtio_device to pass to vp_find_vqs,
but it won't work, so a new API is needed. I can work with you on
this in the background if you like.

> > +            for (i = 0; i < numtxqs; i++) {
> > +                            vi->rq[i]->rvq = vqs[i];
> > +                            vi->sq[i]->svq = vqs[i + numtxqs];
>
> This logic is spread all over. We need some kind of macro to
> get queue number of vq number and back.

Will add this.

> > +            if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
> > +                            vi->cvq = vqs[i + numtxqs];
> > +
> > +                            if (virtio_has_feature(vi->vdev,
VIRTIO_NET_F_CTRL_VLAN))
> > +                                            vi->dev->features |=
NETIF_F_HW_VLAN_FILTER;
>
> This bit does not seem to belong in initialize_vqs.

I will move it back to probe.

> > +            err = virtio_config_val(vdev, VIRTIO_NET_F_NUMTXQS,
> > +                                                            offsetof(struct
virtio_net_config, numtxqs),
> > +                                                            &numtxqs);
> > +
> > +            /* We need atleast one txq */
> > +            if (err || !numtxqs)
> > +                            numtxqs = 1;
>
> err is okay, but should we just fail on illegal values?
> Or change the semantics:
>       n = 0;
>       err = virtio_config_val(vdev, VIRTIO_NET_F_NUMTXQS,
>                               offsetof(struct virtio_net_config, numtxqs),
>                               &n);
>       numtxq = n + 1;

Will this be better:
        int num_queue_pairs = 2;
        int numtxqs;

        err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE,
                                offsetof(struct virtio_net_config,
                                         num_queue_pairs), &num_queue_pairs);
        <ignore error, if any>
        numtxqs = num_queue_pairs / 2;

> > +            if (numtxqs > VIRTIO_MAX_TXQS)
> > +                            return -EINVAL;
>
> Do we strictly need this?
> I think we should just use whatever hardware has,
> or alternatively somehow ignore the unused queues
> (easy for tx, not sure about rx).

vq's are matched between qemu, virtio-net and vhost. Isn't some check
required that userspace has not passed a bad value?

> > +                            if (vi->rq[i]->num == 0) {
> > +                                            err = -ENOMEM;
> > +                                            goto free_recv_bufs;
> > +                            }
> >              }
> If this fails for vq > 0, you have to detach bufs.

Right, will fix this.

> >  free_vqs:
> > +            for (i = 0; i < numtxqs; i++)
> > +                            cancel_delayed_work_sync(&vi->rq[i]->refill);
> >              vdev->config->del_vqs(vdev);
> > -free:
> > +            free_rq_sq(vi);
>
> If we have a wrapper to init all vqs, pls add a wrapper to clean up
> all vqs as well.

Will add that.

> > +            for (i = 0; i < vi->numtxqs; i++) {
> > +                            struct virtqueue *rvq = vi->rq[i]->rvq;
> > +
> > +                            while (1) {
> > +                                            buf = 
> > virtqueue_detach_unused_buf
(rvq);
> > +                                            if (!buf)
> > +                                                            break;
> > +                                            if (vi->mergeable_rx_bufs || 
> > vi->
big_packets)
> > +                                                            
> > give_pages(vi->rq[i],
buf);
> > +                                            else
> > +                                                            
> > dev_kfree_skb(buf);
> > +                                            --vi->rq[i]->num;
> > +                            }
> > +                            BUG_ON(vi->rq[i]->num != 0);
> >              }
> > -            BUG_ON(vi->num != 0);
> > +
> > +            free_rq_sq(vi);
>
>
> This looks wrong here. This function should detach
> and free all bufs, not internal malloc stuff.

That is being done by free_receive_buf after free_unused_bufs()
returns. I hope this addresses your point.

> I think we should have free_unused_bufs that handles
> a single queue, and call it in a loop.

OK, so define free_unused_bufs() as:

static void free_unused_bufs(struct virtnet_info *vi, struct virtqueue
*svq,
                             struct virtqueue *rvq)
{
        /* Use svq and rvq with the remaining code unchanged */
}

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

Reply via email to