Re: [dpdk-dev] [RFC 3/5] eal: lcore state FINISHED is not required

Honnappa Nagarahalli Mon, 29 Mar 2021 19:54:43 -0700

<snip>

> 
> Hi everyone,
Thanks Konstantin for the review.


> 
> > <snip>
> >
> > > >
> > > > > > Subject: [RFC 3/5] eal: lcore state FINISHED is not required
> > > > > >
> > > > > > FINISHED state seems to be used to indicate that the worker's
> > > > > > update of the 'state' is not visible to other threads. There
> > > > > > seems to be no requirement to have such a state.
> > > > >
> > > > > I am not sure "FINISHED" is necessary to be removed, and I
> > > > > propose some of my profiles for discussion.
> > > > >  There are three states for lcore now:
> > > > > "WAIT": indicate lcore can start working
> > > > > "RUNNING": indicate lcore is working
> > > > > "FINISHED": indicate lcore has finished its working and wait to
> > > > > be reset
> > > > If you look at the definitions of "WAIT" and "FINISHED" states,
> > > > they look
> > > similar, except for "wait to be reset" in "FINISHED" state . The
> > > code really does not do anything to reset the lcore. It just changes the
> state to "WAIT".
> 
> 
> I agree that 3 states here seems excessive.
> Just 2 (RUNNING/IDLE) seems enough.
> Though we can't just remove FINISHED here - it will be an Abi breakage.
> Might be deprecate FINISHED now and remove in 21.11.
Agree, will add a new patch to deprecate the FINISHED state. Also, does the 
deprecation notice need to go into 20.08 release notes?

> 
> Also need to decide what rte_eal_wait_lcore() should return in that case?
> Always zero, or always status of last function called?
I am not sure why ' rte_eal_wait_lcore' has the following code:

if (lcore_config[worker_id].state == WAIT)
                return 0;

This indicates that the caller has called 'rte_eal_wait_lcore' function 
earlier. May be there is a use case where there are multiple threads waiting 
for the lcores to complete?
Anyway, IMO, returning the status of the last function always is better for 
this API.

> 
> > > >
> > > > >
> > > > > From the description above, we can find "FINISHED" is different
> > > > > from "WAIT", it can shows that lcore has done the work and finished
> it.
> > > > > Thus, if we remove "FINISHED", maybe we will not know whether
> > > > > the lcore finishes its work or just doesn't start, because this
> > > > > two state has the
> > > same tag "WAIT".
> > > > Looking at "eal_thread_loop", the worker thread sets the state to
> "RUNNING"
> > > before sending the ack back to main core. After that it is
> > > guaranteed that the worker will run the assigned function. Only case
> > > where it will not run the assigned function is when the 'write'
> > > syscall fails, in which case it results in a panic.
> > >
> > > Quick note: it should not panic.
> > > We must find a way to return an error without crashing the whole
> > > application.
> > The syscalls are being used to communicate the status back to the main
> thread. If they fail, it is not possible to communicate the status.
> > May be it is better to panic.
> > We could change the implementation using shared variables, but it
> > would require polling the memory. May be the syscalls are being used to
> avoid polling. However, this polling would happen during init time (or 
> similar)
> for a short duration.
> 
> AFAIK we use read and write not for status communication, but sort of
> sleep/ack point.
> Though I agree if we can't do read/write from the system pipe then
> something is totally wrong, and probably there is no much point to continue.
> 
> > >
> > >
> > > > > Furthermore, consider such a scenario:
> > > > > Core 1 need to monitor Core 2 state, if Core 2 finishes one
> > > > > task, Core 1 can start its working.
> > > > > However, if there is only  one tag "WAIT", Core 1 maybe  start
> > > > > its work at the wrong time, when Core 2 still does not start its
> > > > > task at state
> > > "WAIT".
> > > > > This is just my guess, and at present, there is no similar
> > > > > application scenario in dpdk.
> > > > To be able to do this effectively, core 1 needs to observe the
> > > > state change
> > > from WAIT->RUNNING->FINISHED. This requires that core 1 should be
> > > calling rte_eal_remote_launch and rte_eal_wait_lcore functions. It
> > > is not possible to observe this state transition from a 3rd core
> > > (for ex: a worker might go from
> > > RUNNING->FINISHED->WAIT->RUNNING which a 3rd core might not be
> able
> > > RUNNING->FINISHED->WAIT->to
> > > observe).
> > > >
> > > > >
> > > > > On the other hand, if we decide to remove "FINISHED", please
> > > > > consider the following files:
> > > > > 1. lib/librte_eal/linux/eal_thread.c: line 31
> > > > >     lib/librte_eal/windows/eal_thread.c: line 22
> > > > >     lib/librte_eal/freebsd/eal_thread.c: line 31
> > > > I have looked at these lines, they do not capture "why" FINISHED
> > > > state is
> > > required.
> > > >
> > > >  2.
> > > > > lib/librte_eal/include/rte_launch.h: line 24, 44, 121, 123, 131 3.
> > > > > examples/l2fwd-
> > > > > keepalive/main.c: line 510
> > > > > rte_eal_wait_lcore(id_core) can be removed. Because the core
> > > > > state has been checked as "WAIT", this is a redundant operation
> > >
> > >

Re: [dpdk-dev] [RFC 3/5] eal: lcore state FINISHED is not required

Reply via email to