Hi Guan, Iam waiting for the feedback from the clients. Regards, Muneendra.
-----Original Message----- From: Guan Junxiong [mailto:[email protected]] Sent: Thursday, October 12, 2017 12:16 PM To: Muneendra Kumar M <[email protected]> Cc: Shenhong (C) <[email protected]>; niuhaoxin <[email protected]>; Martin Wilck <[email protected]>; Christophe Varoqui <[email protected]>; [email protected] Subject: Re: [PATCH V4 1/2] multipath-tools: intermittent IO error accounting to improve reliability Hi Muneendra, On 2017/10/12 14:35, Muneendra Kumar M wrote: > Hi Guan, >>> If the patch if OK for you, can I add your Reviewed-by tag into this patch? > The patch is ok for me. > > > The patch is ok for me. > OK, I will add your Reviewed-by tag in the version 7 ASAP. BTW, do your clients give any feedback to improve the feature? Regards Guan > -----Original Message----- > From: Guan Junxiong [mailto:[email protected]] > Sent: Monday, October 09, 2017 6:13 AM > To: Muneendra Kumar M <[email protected]> > Cc: Shenhong (C) <[email protected]>; niuhaoxin <[email protected]>; > Martin Wilck <[email protected]>; Christophe Varoqui > <[email protected]>; [email protected] > Subject: Re: [PATCH V4 1/2] multipath-tools: intermittent IO error accounting > to improve reliability > > Hi Muneendra, > Sorry for late reply because of National Holiday. > > On 2017/10/6 13:54, Muneendra Kumar M wrote: >> Hi Guan, >> Did you push the patch to mainline. >> If so can you just provide me those details. >> If not can you just let me know the status. >> > > Yes, I pushed Version 6 of the patch to the mail list but it hasn't been > merged yet. > It is still waiting for review. > You can find it at this link: > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2017-2DSeptember_msg00296.html&d=DwIDaQ&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=N8T04oW6j0kkcf5fLp8jXA1y75SRN6PM9D-dM5nc2d4&s=sBZTTjpCVZB3NBgGXwPCE1fBtqAmx75s0DkAsVYRrwc&e= > >> As couple of our clients are already using the previous patch(san_path_XX). >> If your patch is pushed then I can give them the updated patch and test the >> same. >> > > If the patch if OK for you, can I add your Reviewed-by tag into this patch? > > Regards, > Guan > >> Regards, >> Muneendra. >> >> >> -----Original Message----- >> From: Muneendra Kumar M >> Sent: Thursday, September 21, 2017 3:41 PM >> To: 'Guan Junxiong' <[email protected]>; Martin Wilck >> <[email protected]>; [email protected]; [email protected] >> Cc: [email protected]; [email protected]; [email protected] >> Subject: RE: [PATCH V4 1/2] multipath-tools: intermittent IO error >> accounting to improve reliability >> >> Hi Guan, >> Thanks for adopting the naming convention. >> Instead of marginal_path_err_recheck_gap_time, marginal_path_recovery_time >> will looks reasonable.Could you please relook into it. >> >> I will review the code in a day time. >> >> Regards, >> Muneendra. >> >> -----Original Message----- >> From: Guan Junxiong [mailto:[email protected]] >> Sent: Thursday, September 21, 2017 3:35 PM >> To: Muneendra Kumar M <[email protected]>; Martin Wilck >> <[email protected]>; [email protected]; [email protected] >> Cc: [email protected]; [email protected]; [email protected] >> Subject: Re: [PATCH V4 1/2] multipath-tools: intermittent IO error >> accounting to improve reliability >> >> Hi, Muneendra >> >> Thanks for your clarification. I adopt this renaming. If it is convenient >> for you, please review the V5 patch that I sent out 2 hours ago. >> >> Regards, >> Guan >> >> On 2017/9/20 20:58, Muneendra Kumar M wrote: >>> Hi Guan, >>>>>> Shall we use existing PATH_SHAKY ? >>> As the path_shaky Indicates path not available for "normal" operations we >>> can use this state. That's a good idea. >>> >>> Regarding the marginal paths below is my explanation. And brocade is >>> publishing couple of white papers regarding the same to educate the SAN >>> administrators and the san community. >>> >>> Marginal path: >>> >>> A host, target, LUN (ITL path) flow goes through SAN. It is to be noted >>> that the for each I/O request that goes to the SCSI layer, it transforms >>> into a single SCSI exchange. In a single SAN, there are typically multiple >>> SAN network paths for a ITL flow/path. Each SCSI exchange can take one of >>> the various network paths that are available for the ITL path. A SAN can >>> be based on Ethernet, FC, Infiniband physical networks to carry block >>> storage traffic (SCSI, NVMe etc.) >>> >>> There are typically two type of SAN network problems that are categorized >>> as marginal issues. These issues by nature are not permanent in time and do >>> come and go away over time. >>> 1) Switches in the SAN can have intermittent frame drops or intermittent >>> frame corruptions due to bad optics cable (SFP) or any such wear/tear port >>> issues. This causes ITL flows that go through the faulty switch/port to >>> intermittently experience frame drops. >>> 2) There exists SAN topologies where there are switch ports in the fabric >>> that becomes the only conduit for many different ITL flows across multiple >>> hosts. These single network paths are essentially shared across multiple >>> ITL flows. Under these conditions if the port link bandwidth is not able to >>> handle the net sum of the shared ITL flows bandwidth going through the >>> single path then we could see intermittent network congestion problems. >>> This condition is called network oversubscription. The intermittent >>> congestions can delay SCSI exchange completion time (increase in I/O >>> latency is observed). >>> >>> To overcome the above network issues and many more such target issues, >>> there are frame level retries that are done in HBA device firmware and I/O >>> retries in the SCSI layer. These retries might succeed because of two >>> reasons: >>> 1) The intermittent switch/port issue is not observed >>> 2) The retry I/O is a new SCSI exchange. This SCSI exchange can take an >>> alternate SAN path for the ITL flow, if such an SAN path exists. >>> 3) Network congestion disappears momentarily because the net I/O bandwidth >>> coming from multiple ITL flows on the single shared network path is >>> something the path can handle >>> >>> However in some cases we have seen I/O retries don’t succeed because the >>> retry I/Os hits a SAN network path that has intermittent switch/port issue >>> and/or network congestion. >>> >>> On the host thus we see configurations two or more ITL path sharing the >>> same target/LUN going through two or more HBA ports. These HBA ports are >>> connected to two or more SAN to the same target/LUN. >>> If the I/O fails at the multipath layer then, the ITL path is turned into >>> Failed state. Because of the marginal nature of the network, the next >>> Health Check command sent from multipath layer might succeed, which results >>> in making the ITL path into Active state. You end up seeing the DM path >>> state going into Active, Failed, Active transitions. This results in >>> overall reduction in application I/O throughput and sometime application >>> I/O failures (because of timing constraints). All this can happen because >>> of I/O retries and I/O request moving across multiple paths of the DM >>> device. In the host it is to be noted all I/O retries on a single path and >>> I/O movement across multiple paths results in slowing down the forward >>> progress of new application I/O. Reason behind, the above I/O re-queue >>> actions are given higher priority than the newer I/O requests coming from >>> the application. >>> >>> The above condition of the ITL path is hence called “marginal”. >>> >>> What we desire is for the DM to deterministically categorize a ITL Path as >>> “marginal” and move all the pending I/Os from the marginal Path to an >>> Active Path. This will help in meeting application I/O timing constraints. >>> Also a capability to automatically re-instantiate the marginal path into >>> Active once the marginal condition in the network is fixed. >>> >>> >>> Based on the above explanation I want to rename the names as >>> marginal_path_XXXX and this is irrespective of any storage network. >>> >>> Regards, >>> Muneendra. >> > -- dm-devel mailing list [email protected] https://www.redhat.com/mailman/listinfo/dm-devel
