> -----Original Message----- > From: Aaron Conole <acon...@redhat.com> > Sent: Tuesday, November 26, 2019 1:57 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com> > Cc: Thomas Monjalon <tho...@monjalon.net>; Amber, Kumar > <kumar.am...@intel.com>; dev@dpdk.org; Wang, Yipeng1 > <yipeng1.w...@intel.com>; Yigit, Ferruh <ferruh.yi...@intel.com>; Thakur, > Sham Singh <sham.singh.tha...@intel.com>; David Marchand > <dmarc...@redhat.com> > Subject: Re: [dpdk-dev] [PATCH v3] hash: added a new API to hash to query > key id > > "Van Haaren, Harry" <harry.van.haa...@intel.com> writes: > > >> -----Original Message----- > >> From: Van Haaren, Harry > >> Sent: Tuesday, November 26, 2019 1:19 PM > >> To: Aaron Conole <acon...@redhat.com>; Thomas Monjalon > <tho...@monjalon.net> > > > > <snip> > > > >> > EAL: Test assert service_lcore_en_dis_able line 487 failed: Ex-service > >> core > >> > function call had no effect. > >> > > >> > So I'll spend some time in this area, it seems. > >> > >> > >> The below diff makes it 100% reproducible here, failing every time. > >> > >> It seems like the main thread is returning, before the service thread has > >> returned. > >> > >> The rte_eal_mp_wait_lcore() call seems to not wait on the service-core, > >> which allows > >> the main thread to read the "service_remote_launch_flag" value as 0 > (before > >> the service-thread writes it to 1). > >> > >> Adding the delay between the service launch and service write being > >> performed makes this issue much much more likely to occur - so the above > >> description I have confidence in. > >> > >> What I'm not clear on (yet) is why the eal_mp_wait_lcore() isn't > waiting... > >> > >> -H > >> > >> > >> diff --git a/app/test/test_service_cores.c > b/app/test/test_service_cores.c > >> index 9fe38f5e0..846ad00d1 100644 > >> --- a/app/test/test_service_cores.c > >> +++ b/app/test/test_service_cores.c > >> @@ -445,6 +445,7 @@ static int > >> service_remote_launch_func(void *arg) > >> { > >> RTE_SET_USED(arg); > >> + rte_delay_ms(100); > >> service_remote_launch_flag = 1; > >> return 0; > >> } > > > > Diff below seems to fix the problem here; Aaron would you test the below > fix in your setup for a while too? > > I have a loop running here attempting to reproduce - but before 100% > failures and so far 100% passes with the added wait_lcore() call. > > > > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c > > index 9fe38f5e0..62ffedb19 100644 > > --- a/app/test/test_service_cores.c > > +++ b/app/test/test_service_cores.c > > @@ -445,6 +445,7 @@ static int > > service_remote_launch_func(void *arg) > > { > > RTE_SET_USED(arg); > > + rte_delay_ms(100); > > service_remote_launch_flag = 1; > > return 0; > > } > > @@ -483,6 +484,7 @@ service_lcore_en_dis_able(void) > > int ret = rte_eal_remote_launch(service_remote_launch_func, NULL, > > slcore_id); > > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch > failed."); > > + rte_eal_wait_lcore(slcore_id); > > rte_eal_mp_wait_lcore(); > > Ahh, I see. Actually, this brings up a question - is the intent for > mp_wait_lcore to cycle through the service cores as well? Because IIUC, > the issue will be the lcore will be set to ROLE_RTE normally, but > service cores will do: ROLE_SERVICE and then the wait cannot work. > > If the idea is that mp_wait_lcore should work (and looking at the test, > it seems like it is the intent?) then it will need to cycle through > service cores, too. If the intent is that it shouldn't, then we should > remove those calls from the test application to prevent developer from > misunderstanding. > > Either way, the documentation for `rte_service_lcore_start` is a bit too > ambiguous and needs to reflect whether the mp_wait_lcore should work. I > think either it should (which means updating rte_get_next_lcore to > include ROLE_SERVICE), or none of the lcore functions should work, and > we should have an rte_service...() equivalent that should be used.
Service cores are meant to be transparent to the application. The test application testing this particular usage is the corner-case. The rte_eal_mp_wait_lcore() is correct to NOT wait for service-cores, as they are not always under application control. The observed test failure is a bug in the test code, it should use the explicit rte_eal_wait_lcore() call. I'll send a patch later today.