> "Michael S. Tsirkin" <[email protected]>
> > > What other shared TX/RX locks are there?  In your setup, is the same
> > > macvtap socket structure used for RX and TX?  If yes this will create
> > > cacheline bounces as sk_wmem_alloc/sk_rmem_alloc share a cache line,
> > > there might also be contention on the lock in sk_sleep waitqueue.
> > > Anything else?
> >
> > The patch is not introducing any locking (both vhost and virtio-net).
> > The single stream drop is due to different vhost threads handling the
> > RX/TX traffic.
> >
> > I added a heuristic (fuzzy) to determine if more than one flow
> > is being used on the device, and if not, use vhost[0] for both
> > tx and rx (vhost_poll_queue figures this out before waking up
> > the suitable vhost thread).  Testing shows that single stream
> > performance is as good as the original code.
>
> ...
>
> > This approach works nicely for both single and multiple stream.
> > Does this look good?
> >
> > Thanks,
> >
> > - KK
>
> Yes, but I guess it depends on the heuristic :) What's the logic?

I define how recently a txq was used. If 0 or 1 txq's were used
recently, use vq[0] (which also handles rx). Otherwise, use
multiple txq (vq[1-n]). The code is:

/*
 * Algorithm for selecting vq:
 *
 * Condition                                    Return
 * RX vq                                        vq[0]
 * If all txqs unused                           vq[0]
 * If one txq used, and new txq is same         vq[0]
 * If one txq used, and new txq is different    vq[vq->qnum]
 * If > 1 txqs used                             vq[vq->qnum]
 *      Where "used" means the txq was used in the last 'n' jiffies.
 *
 * Note: locking is not required as an update race will only result in
 * a different worker being woken up.
 */
static inline struct vhost_virtqueue *vhost_find_vq(struct vhost_poll
*poll)
{
        if (poll->vq->qnum) {
                struct vhost_dev *dev = poll->vq->dev;
                struct vhost_virtqueue *vq = &dev->vqs[0];
                unsigned long max_time = jiffies - 5; /* Some macro needed */
                unsigned long *table = dev->jiffies;
                int i, used = 0;

                for (i = 0; i < dev->nvqs - 1; i++) {
                        if (time_after_eq(table[i], max_time) && ++used > 1) {
                                vq = poll->vq;
                                break;
                        }
                }
                table[poll->vq->qnum - 1] = jiffies;
                return vq;
        }

        /* RX is handled by the same worker thread */
        return poll->vq;
}

void vhost_poll_queue(struct vhost_poll *poll)
{
        struct vhost_virtqueue *vq = vhost_find_vq(poll);

        vhost_work_queue(vq, &poll->work);
}

Since poll batches packets, find_vq does not seem to add much
to the CPU utilization (or BW). I am sure that code can be
optimized much better.

The results I sent in my last mail were without your use_mm
patch, and the only tuning was to make vhost threads run on
only cpus 0-3 (though the performance is good even without
that). I will test it later today with the use_mm patch too.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to