Hi Chris, You mention that you don't see any issues with the switches. Are you tracking the amount of frames that get dropped by your switches? Any other errors on the switch ports? How's the link utilisation? When you get high latency, does it affect only that node or all at the same time? In that case, do you have pause frames on and are they being triggered? Is there a spike in traffic at that moment (that second)? Is there any packetloss between your hosts and your eql(s) ?
Those were just some thoughts that popped into my head. I hope they can help you find the cause. Best regards, Max Vernimmen On 13/04/2017, 22:13, "[email protected] on behalf of Chris Adams" <[email protected] on behalf of [email protected]> wrote: I hadn't thought of trying that... it does look like I still see occasional high read latency (100+ ms); I haven't seen the 1+ second (but that's sporadic anyway). Interestingly, it seems to only get higher latency on the logical volumes; just reading the first block of the iSCSI device (e.g. /dev/sdf) seems to consistently be under 1ms. Just to explain the setup (for those not familiar with oVirt): when you create a storage domain on an iSCSI device, oVirt sets up a Linux volume group on it, and then creates several internal-use logical volumes (including the "metadata" LV that I'm reading). After that, each virtual disk gets an LV create as well. The network _shouldn't_ be a problem (famous last words); the servers each have two 1G ports to a pair of N3000 switches for the storage network, and the SAN has 10G ports (but each 10G port is under 100 megabits per second most of the time). There's also an FS7600 with 1G ports talking to the same SAN. I'm going to install SAN HQ to get a better look at the SAN-side performance (but I've got to order a copy of Windows for that, bleh!). Once upon a time, [email protected] <[email protected]> said: > Dell - Internal Use - Confidential > > If you disable multipathd and test against a single path, do you see the same latency? > > I'm wondering if there is a Networking issue at play. > > -----Original Message----- > From: linux-poweredge-bounces-Lists On Behalf Of Chris Adams > Sent: Thursday, April 13, 2017 1:24 PM > To: linux-poweredge-Lists <[email protected]> > Subject: [Linux-PowerEdge] EqualLogic, Linux, and latency > > I have a stack of PowerEdge servers, running CentOS 7 and oVirt (virtualization environment), using an EqualLogic PS6610 for VM storage. > I'm using the regular Linux multipath daemon, not the Dell kit, because oVirt manages the multipathing (and doesn't handle the Dell kit). > > I am periodically seeing high latency from the SAN, even from a host not running any VMs. I see 100+ ms reads, and sometimes even multi-second reads (to read a single 4K block). oVirt logs warnings about this; I wrote a simple perl script that duplicates what oVirt does (open a metadata logical volume block device with O_DIRECT, read a 4K block, and close it), and I see the same thing. > > I've got iscsid and multipathd configured per the recommended values, and I'm not seeing any issues on Linux or at the switches (dedicated network for iSCSI); it would seem to be something on the SAN itself. > The SAN isn't reporting anything though, so I'm not sure what to look at. > > Any suggestions? > > -- > Chris Adams <[email protected]> > > _______________________________________________ > Linux-PowerEdge mailing list > [email protected] > https://lists.us.dell.com/mailman/listinfo/linux-poweredge -- Chris Adams <[email protected]> _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge
