On 11/22/2022 9:25 AM, Zhou, YidingX wrote: > > >> -----Original Message----- >> From: Zhou, YidingX <yidingx.z...@intel.com> >> Sent: Wednesday, September 21, 2022 3:15 PM >> To: Stephen Hemminger <step...@networkplumber.org>; Zhang, Qi Z >> <qi.z.zh...@intel.com> >> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.bura...@intel.com>; He, >> Xingguang <xingguang...@intel.com>; sta...@dpdk.org >> Subject: RE: [PATCH v2] net/pcap: fix timeout of stopping device >> >> >> >>> -----Original Message----- >>> From: Stephen Hemminger <mailto:step...@networkplumber.org> >>> Sent: Tuesday, September 6, 2022 10:58 PM >>> To: Zhou, YidingX <mailto:yidingx.z...@intel.com> >>> Cc: mailto:dev@dpdk.org; Zhang, Qi Z <mailto:qi.z.zh...@intel.com>; >>> Burakov, Anatoly >>> <mailto:anatoly.bura...@intel.com>; He, Xingguang >>> <mailto:xingguang...@intel.com>; >>> mailto:sta...@dpdk.org >>> Subject: Re: [PATCH v2] net/pcap: fix timeout of stopping device >>> >>> On Tue, 6 Sep 2022 16:05:11 +0800 >>> Yiding Zhou <mailto:yidingx.z...@intel.com> wrote: >>> >>>> The pcap file will be synchronized to the disk when stopping the device. >>>> It takes a long time if the file is large that would cause the >>>> 'detach sync request' timeout when the device is closed under >>>> multi-process scenario. >>>> >>>> This commit fixes the issue by using alarm handler to release dumper. >>>> >>>> Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private") >>>> Cc: mailto:sta...@dpdk.org >>>> >>>> Signed-off-by: Yiding Zhou <mailto:yidingx.z...@intel.com> >>> >>> >>> I think you need to redesign the handshake if this the case. >>> Forcing 30 second delay at the end of all uses of pcap is not acceptable. >> >> @Zhang, Qi Z Do we need to redesign the handshake to fix this? > > Hi, Ferruh > Sorry for the late reply. > I did not receive your email on Oct 6, I got your comments from patchwork. > > "Can you please provide more details on multi-process communication and > call trace, to help us think about a solution to address this issue in a > more generic way (not just for pcap but for any case device close takes > more than multi-process timeout)?" > > I try to explain this issue with a sequence diagram, hope it can be displayed > correctly in the mail. > > thread intr thread intr > thread thread > of secondary of secondary of primary > of primary > | | > | | > | | > | | > rte_eal_hotplug_remove > rte_dev_remove > eal_dev_hotplug_request_to_primary > rte_mp_request_sync ------------------------------------------------------->| > > | > > handle_secondary_request > > |<-----------------| > > | > > __handle_secondary_request > > eal_dev_hotplug_request_to_secondary > |<------------------------------------- rte_mp_request_sync > | > handle_primary_request--------->| > | > __handle_primary_request > local_dev_remove(this will take long time) > rte_mp_reply > -------------------------------->| > > | > > local_dev_remove > |<------------------------------------------------- rte_mp_reply > > The marked 'local_dev_remove()' in the secondary process will perform a pcap > file synchronization operation. > When the pcap file is too large, it will take a lot of time (according to my > test 100G takes 20+ seconds). > This caused the processing of hot_plug message to time out.
Hi Yiding, Thanks for the information, Right now all MP operations timeout is hardcoded in the code and it is 5 seconds. Do you think does it work to have an API to set custom timeout, something like `rte_mp_timeout_set()`, and call this from pdump? This gives a generic solution for similar cases, not just for pcap. But my concern is if this is too much multi-process related internal detail to update, @Anatoly may comment on this.