Hi everyone,
> <snip> > > > > > > > > > Subject: [RFC 3/5] eal: lcore state FINISHED is not required > > > > > > > > > > FINISHED state seems to be used to indicate that the worker's > > > > > update of the 'state' is not visible to other threads. There seems > > > > > to be no requirement to have such a state. > > > > > > > > I am not sure "FINISHED" is necessary to be removed, and I propose > > > > some of my profiles for discussion. > > > > There are three states for lcore now: > > > > "WAIT": indicate lcore can start working > > > > "RUNNING": indicate lcore is working > > > > "FINISHED": indicate lcore has finished its working and wait to be > > > > reset > > > If you look at the definitions of "WAIT" and "FINISHED" states, they look > > similar, except for "wait to be reset" in "FINISHED" state . The code > > really does > > not do anything to reset the lcore. It just changes the state to "WAIT". I agree that 3 states here seems excessive. Just 2 (RUNNING/IDLE) seems enough. Though we can't just remove FINISHED here - it will be an Abi breakage. Might be deprecate FINISHED now and remove in 21.11. Also need to decide what rte_eal_wait_lcore() should return in that case? Always zero, or always status of last function called? > > > > > > > > > > > From the description above, we can find "FINISHED" is different from > > > > "WAIT", it can shows that lcore has done the work and finished it. > > > > Thus, if we remove "FINISHED", maybe we will not know whether the > > > > lcore finishes its work or just doesn't start, because this two state > > > > has the > > same tag "WAIT". > > > Looking at "eal_thread_loop", the worker thread sets the state to > > > "RUNNING" > > before sending the ack back to main core. After that it is guaranteed that > > the > > worker will run the assigned function. Only case where it will not run the > > assigned function is when the 'write' syscall fails, in which case it > > results in a > > panic. > > > > Quick note: it should not panic. > > We must find a way to return an error > > without crashing the whole application. > The syscalls are being used to communicate the status back to the main > thread. If they fail, it is not possible to communicate the status. > May be it is better to panic. > We could change the implementation using shared variables, but it would > require polling the memory. May be the syscalls are being used to > avoid polling. However, this polling would happen during init time (or > similar) for a short duration. AFAIK we use read and write not for status communication, but sort of sleep/ack point. Though I agree if we can't do read/write from the system pipe then something is totally wrong, and probably there is no much point to continue. > > > > > > > > Furthermore, consider such a scenario: > > > > Core 1 need to monitor Core 2 state, if Core 2 finishes one task, > > > > Core 1 can start its working. > > > > However, if there is only one tag "WAIT", Core 1 maybe start its > > > > work at the wrong time, when Core 2 still does not start its task at > > > > state > > "WAIT". > > > > This is just my guess, and at present, there is no similar > > > > application scenario in dpdk. > > > To be able to do this effectively, core 1 needs to observe the state > > > change > > from WAIT->RUNNING->FINISHED. This requires that core 1 should be calling > > rte_eal_remote_launch and rte_eal_wait_lcore functions. It is not possible > > to > > observe this state transition from a 3rd core (for ex: a worker might go > > from > > RUNNING->FINISHED->WAIT->RUNNING which a 3rd core might not be able to > > observe). > > > > > > > > > > > On the other hand, if we decide to remove "FINISHED", please > > > > consider the following files: > > > > 1. lib/librte_eal/linux/eal_thread.c: line 31 > > > > lib/librte_eal/windows/eal_thread.c: line 22 > > > > lib/librte_eal/freebsd/eal_thread.c: line 31 > > > I have looked at these lines, they do not capture "why" FINISHED state is > > required. > > > > > > 2. > > > > lib/librte_eal/include/rte_launch.h: line 24, 44, 121, 123, 131 3. > > > > examples/l2fwd- > > > > keepalive/main.c: line 510 > > > > rte_eal_wait_lcore(id_core) can be removed. Because the core state > > > > has been checked as "WAIT", this is a redundant operation > > > >