Matias Elo(matiaselo) replied on github web page: platform/linux-generic/pktio/netmap.c line 64 @@ -786,6 +824,100 @@ static int netmap_recv(pktio_entry_t *pktio_entry, int index, return num_rx; } +static int netmap_recv_tmo(pktio_entry_t *pktio_entry, int index, + odp_packet_t pkt_table[], int num, uint64_t usecs) +{ + struct timeval timeout; + int ret; + int maxfd; + fd_set readfds; + + ret = netmap_recv(pktio_entry, index, pkt_table, num); + if (ret != 0) + return ret; + + /* 32 bit architectures use 32-bit division if possible */ + if (sizeof(size_t) < 8) { + if (usecs <= UINT32_MAX) { + timeout.tv_sec = ((uint32_t)usecs) / (1000 * 1000);
Comment: To me the value of this optimization seems questionable (everywhere where it is used). System calls are used here anyway, so one division operation (next comment) shouldn't have that much effect. > Matias Elo(matiaselo) wrote: > Not needed. >> Matias Elo(matiaselo) wrote: >> Not needed. >>> Matias Elo(matiaselo) wrote: >>> Same comments than to `odp_pktin_recv_tmo()`. >>>> Matias Elo(matiaselo) wrote: >>>> Can roll over if SLEEP_USEC > 1. >>>>> Matias Elo(matiaselo) wrote: >>>>> Seems unnecessarily complex. Something like the following could do the >>>>> same: >>>>> ``` >>>>> uint64_t sleep_round = 0; >>>>> ... >>>>> /* Check every SLEEP_CHECK rounds if total wait time >>>>> * has been exceeded. */ >>>>> if ((sleep_round++ & (SLEEP_CHECK - 1)) == 0) { >>>>> t2 = odp_time_local(); >>>>> >>>>> if (odp_time_cmp(t2, t1) > 0) >>>>> return 0; >>>>> } >>>>> wait = wait > SLEEP_USEC ? wait - SLEEP_USEC : 0; >>>>> ``` >>>>> >>>>>> Matias Elo(matiaselo) wrote: >>>>>> Comment still says nanoseconds. Should also state that the maximum value >>>>>> is 1 000 000, so that timespec.tv_nsec is used correctly. >>>>>>> Matias Elo(matiaselo) wrote: >>>>>>> Agreed, the "divisor" can be added later if needed. >>>>>>>> Juha-Matti Tilli(jmtilli) wrote: >>>>>>>> Now that's a great idea! If somebody sees one microsecond as too big >>>>>>>> sleep time, it is always possible to implement a "divisor": multiply >>>>>>>> the sleep time by e.g. 4, and sleep e.g. 250 nanoseconds. No slow >>>>>>>> division operation is needed for this "divisor", only multiplication. >>>>>>>> But I won't implement this divisor now, as SLEEP_NSEC anyway always >>>>>>>> used to be 1000. >>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>> Ifdefs are not allowed. IMO the cleanest fix would be to add a >>>>>>>>> separate commit (before this one) to this patch set, which modifies >>>>>>>>> odp_pktin_wait_time() to always return microseconds and converts >>>>>>>>> SLEEP_NSEC to SLEEP_USEC (and does necessary changes to make >>>>>>>>> everything work). >>>>>>>>> >>>>>>>>> Or does someone oppose having a minimum sleep time of 1us between >>>>>>>>> pktin receive calls in the old code path? >>>>>>>>>> Juha-Matti Tilli(jmtilli) wrote: >>>>>>>>>> I would prefer return 0; as the trial_successful already tells not >>>>>>>>>> to use this interrupt-driven way, but it probably doesn't matter >>>>>>>>>> much. I'll do this (with return 0). >>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>> By checking both ops here you can return right away and minimize >>>>>>>>>>> overhead for pktios not supporting interrupt driven mode. >>>>>>>>>>> E.g. >>>>>>>>>>> ``` >>>>>>>>>>> if (!entry[i]->s.ops->recv_mq_tmo && >>>>>>>>>>> !entry[i]->s.ops->fd_set) { >>>>>>>>>>> *trial_successful = 0; >>>>>>>>>>> return -1; >>>>>>>>>>> } >>>>>>>>>>> ``` >>>>>>>>>>>> Juha-Matti Tilli(jmtilli) wrote: >>>>>>>>>>>> I'm somewhat worried that somebody could adjust SLEEP_NSEC and >>>>>>>>>>>> then the entire thing fails. What about this: >>>>>>>>>>>> >>>>>>>>>>>> <pre> >>>>>>>>>>>> #if SLEEP_NSEC == 1000 >>>>>>>>>>>> uint64_t usecs = wait; >>>>>>>>>>>> #else >>>>>>>>>>>> uint64_t usecs = (wait * SLEEP_NSEC + 999) / 1000; >>>>>>>>>>>> #endif >>>>>>>>>>>> </pre> >>>>>>>>>>>> >>>>>>>>>>>> For the return value, you mean <code>odp_pktin_wait_time()</code> >>>>>>>>>>>> and not <code>odp_pktin_recv_mq_tmo()</code>, right? Because I >>>>>>>>>>>> don't see how the return value of the latter could depend on >>>>>>>>>>>> <code>SLEEP_NSEC</code> definition. >>>>>>>>>>>> >>>>>>>>>>>> Also, if we modify <code>odp_pktin_wait_time()</code> to always >>>>>>>>>>>> return microseconds, then <code>odp_pktin_recv_mq_tmo()</code> >>>>>>>>>>>> suddenly starts to work incorrectly with <code>SLEEP_NSEC != >>>>>>>>>>>> 1000</code> as there is a <code>wait--</code> line in there that >>>>>>>>>>>> does this just before sleeping one microsecond, exactly. >>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>> Same thing with the loop order than with netmap. >>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>> Couple comment lines describing the different stages of this >>>>>>>>>>>>>> function wouldn't hurt when someone is reading this code later >>>>>>>>>>>>>> on. >>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>> Same thing with the loop order than with netmap. >>>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>>> Since this loop is a performance optimization I would move it >>>>>>>>>>>>>>>> before the loop which sets the read descriptors. No use to set >>>>>>>>>>>>>>>> the descriptors if packets are already available. >>>>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>>>> The motivation for a separate `odp_pktin_wait_time()` >>>>>>>>>>>>>>>>> function was to avoid having to do time conversions in the >>>>>>>>>>>>>>>>> fast path functions. As `odp_pktin_wait_time()` already >>>>>>>>>>>>>>>>> returns the time in microseconds no additional conversions >>>>>>>>>>>>>>>>> are needed here. Same thing for `odp_pktin_recv_mq_tmo()`. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Currently, the value returned by `odp_pktin_recv_mq_tmo()` >>>>>>>>>>>>>>>>> depends on `SLEEP_NSEC` definition. This should be modified >>>>>>>>>>>>>>>>> so that the returned value is always in microseconds. >>>>>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>>>>> Every pktio type we currenly have uses microseconds for >>>>>>>>>>>>>>>>>> timeout, so you could change `recv_tmo()` and >>>>>>>>>>>>>>>>>> `recv_mq_tmo()` to use microseconds instead of nanoseconds. >>>>>>>>>>>>>>>>>> This removes the need for per operation conversions. >>>>>>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>>>>>> > Are you entirely sure about this? This is a function >>>>>>>>>>>>>>>>>>> > pointer, and if two pktio devices have the same function >>>>>>>>>>>>>>>>>>> > pointer for recv_mq_tmo(), the other function pointers >>>>>>>>>>>>>>>>>>> > including the recv() function pointer should be identical >>>>>>>>>>>>>>>>>>> > as well. We do pass the device (entry[i]) to the function >>>>>>>>>>>>>>>>>>> > pointed to by recv_mq_tmo as well, so it should be able >>>>>>>>>>>>>>>>>>> > to update the correct counters. The ops structure is >>>>>>>>>>>>>>>>>>> > const, after all... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I stand corrected, didn't notice the pktio_entry_t array >>>>>>>>>>>>>>>>>>> parameter. A bit confusing as `recv_mq_tmo()` behaves >>>>>>>>>>>>>>>>>>> differently than pretty much every other pktio op. >>>>>>>>>>>>>>>>>>>> muvarov wrote >>>>>>>>>>>>>>>>>>>> just pass nsec to drivers or you will have double >>>>>>>>>>>>>>>>>>>> calculation like in netmap code. >>>>>>>>>>>>>>>>>>>>> Juha-Matti Tilli(jmtilli) wrote: >>>>>>>>>>>>>>>>>>>>> Are you entirely sure about this? This is a function >>>>>>>>>>>>>>>>>>>>> pointer, and if two pktio devices have the same function >>>>>>>>>>>>>>>>>>>>> pointer for recv_mq_tmo(), the other function pointers >>>>>>>>>>>>>>>>>>>>> including the recv() function pointer should be identical >>>>>>>>>>>>>>>>>>>>> as well. We do pass the device (entry[i]) to the function >>>>>>>>>>>>>>>>>>>>> pointed to by recv_mq_tmo as well, so it should be able >>>>>>>>>>>>>>>>>>>>> to update the correct counters. The ops structure is >>>>>>>>>>>>>>>>>>>>> const, after all... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So, when receiving packets from a particular device using >>>>>>>>>>>>>>>>>>>>> netmap, we do call netmap_recv() and pass to it the >>>>>>>>>>>>>>>>>>>>> pktio_entry[i] and also index[i] for the queue index. >>>>>>>>>>>>>>>>>>>>> This is the same that the odp_pktin_recv_mq_tmo() >>>>>>>>>>>>>>>>>>>>> function already did. It calls odp_pktin_recv() in a >>>>>>>>>>>>>>>>>>>>> loop, and the latter function calls >>>>>>>>>>>>>>>>>>>>> entry->s.ops->recv(entry, queue.index, packets, num); >>>>>>>>>>>>>>>>>>>>> which to me seems to be exactly the same thing -- you >>>>>>>>>>>>>>>>>>>>> pass exactly the same entry and the same queue.index. >>>>>>>>>>>>>>>>>>>>> Same function, same arguments, I can't see how the >>>>>>>>>>>>>>>>>>>>> behavior would be different... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Now, if somebody opposes the .recv_mq_tmo function >>>>>>>>>>>>>>>>>>>>> pointer due to added code line count, I might perhaps >>>>>>>>>>>>>>>>>>>>> agree. But it seems to be the only way the DPDK support >>>>>>>>>>>>>>>>>>>>> for genuine sleep could be added, so I see some value in >>>>>>>>>>>>>>>>>>>>> it. >>>>>>>>>>>>>>>>>>>>>> Juha-Matti Tilli(jmtilli) wrote: >>>>>>>>>>>>>>>>>>>>>> Several options: (1) I could implement a binary search >>>>>>>>>>>>>>>>>>>>>> algorithm that incrementally checks various usec values >>>>>>>>>>>>>>>>>>>>>> and uses odp_pktin_wait_time() to check which one of >>>>>>>>>>>>>>>>>>>>>> these trial attempts result in the same sleep wait >>>>>>>>>>>>>>>>>>>>>> value, but that would be slow; (2) I could implement >>>>>>>>>>>>>>>>>>>>>> odp_pktin_wait_time_reverse() that does the conversion >>>>>>>>>>>>>>>>>>>>>> in the other direction, but that would be an API change; >>>>>>>>>>>>>>>>>>>>>> (3) I could just do the conversion and pass usec values >>>>>>>>>>>>>>>>>>>>>> instead of sleep wait values. I carefully considered >>>>>>>>>>>>>>>>>>>>>> each and selected (3), but do you think (2) or (1) would >>>>>>>>>>>>>>>>>>>>>> be better? >>>>>>>>>>>>>>>>>>>>>>> Juha-Matti Tilli(jmtilli) wrote: >>>>>>>>>>>>>>>>>>>>>>> Will fix. >>>>>>>>>>>>>>>>>>>>>>>> Juha-Matti Tilli(jmtilli) wrote: >>>>>>>>>>>>>>>>>>>>>>>> I disagree. SLEEP_NSEC is a private #define in >>>>>>>>>>>>>>>>>>>>>>>> odp_packet_io.c so what the wait parameter actually >>>>>>>>>>>>>>>>>>>>>>>> means cannot be interpreted in the drivers. (Believe >>>>>>>>>>>>>>>>>>>>>>>> me, I tried it!) Thus, the conversion from the sleep >>>>>>>>>>>>>>>>>>>>>>>> wait parameter to microseconds is needed. >>>>>>>>>>>>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>>>>>>>>>>>> As I noted in the previous comment, I think this >>>>>>>>>>>>>>>>>>>>>>>>> check isn't enough. The input queues would have to be >>>>>>>>>>>>>>>>>>>>>>>>> from the same pktio device for this to work. >>>>>>>>>>>>>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> Internal functions should not use odp_ prefix. >>>>>>>>>>>>>>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> You should pass the `wait` parameter to >>>>>>>>>>>>>>>>>>>>>>>>>>> `entry->s.ops->recv_tmo()` and not do any >>>>>>>>>>>>>>>>>>>>>>>>>>> conversions here. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matias Elo(matiaselo) wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> ODP_TIME_SEC_IN_NS would work here. https://github.com/Linaro/odp/pull/341#discussion_r160122743 updated_at 2018-01-08 11:39:22