On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa <mukawa at igel.co.jp> wrote:
> On 2015/12/22 13:47, Rich Lane wrote: > > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu < > yuanhan.liu at linux.intel.com> > > wrote: > > > >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote: > >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD > >> in a few > >>> ways: > >> Rich, thanks for the info! > >> > >>> 1. new_device/destroy_device: Link state change (will be covered by the > >> link > >>> status interrupt). > >>> 2. new_device: Add first queue to datapath. > >> I'm wondering why vring_state_changed() is not used, as it will also be > >> triggered at the beginning, when the default queue (the first queue) is > >> enabled. > >> > > Turns out I'd misread the code and it's already using the > > vring_state_changed callback for the > > first queue. Not sure if this is intentional but vring_state_changed is > > called for the first queue > > before new_device. > > > > > >>> 3. vring_state_changed: Add/remove queue to datapath. > >>> 4. destroy_device: Remove all queues (vring_state_changed is not called > >> when > >>> qemu is killed). > >> I had a plan to invoke vring_state_changed() to disable all vrings > >> when destroy_device() is called. > >> > > That would be good. > > > > > >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM. > >> You can get the 'struct virtio_net' dev from all above callbacks. > > > > > >> 1. Link status interrupt. > >> > >> To vhost pmd, new_device()/destroy_device() equals to the link status > >> interrupt, where new_device() is a link up, and destroy_device() is link > >> down(). > >> > >> > >>> 2. New queue_state_changed callback. Unlike vring_state_changed this > >> should > >>> cover the first queue at new_device and removal of all queues at > >>> destroy_device. > >> As stated above, vring_state_changed() should be able to do that, except > >> the one on destroy_device(), which is not done yet. > >> > >>> 3. Per-queue or per-device NUMA node info. > >> You can query the NUMA node info implicitly by get_mempolicy(); check > >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference. > >> > > Your suggestions are exactly how my application is already working. I was > > commenting on the > > proposed changes to the vhost PMD API. I would prefer to > > use RTE_ETH_EVENT_INTR_LSC > > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead > > of these vhost-specific > > hacks. The queue state change callback is the one new API that needs to > be > > added because > > normal NICs don't have this behavior. > > > > You could add another rte_eth_event_type for the queue state change > > callback, and pass the > > queue ID, RX/TX direction, and enable bit through cb_arg. > > Hi Rich, > > So far, EAL provides rte_eth_dev_callback_register() for event handling. > DPDK app can register callback handler and "callback argument". > And EAL will call callback handler with the argument. > Anyway, vhost library and PMD cannot change the argument. > You're right, I'd mistakenly thought that the PMD controlled the void * passed to the callback. Here's a thought: struct rte_eth_vhost_queue_event { uint16_t queue_id; bool rx; bool enable; }; int rte_eth_vhost_get_queue_event(uint8_t port_id, struct rte_eth_vhost_queue_event *event); On receiving the ethdev event the application could repeatedly call rte_eth_vhost_get_queue_event to find out what happened. An issue with having the application dig into struct virtio_net is that it can only be safely accessed from a callback on the vhost thread. A typical application running its control plane on lcore 0 would need to copy all the relevant info from struct virtio_net before sending it over. As you mentioned, queues for a single vhost port could be located on different NUMA nodes. I think this is an uncommon scenario but if needed you could add an API to retrieve the NUMA node for a given port and queue.