On Wed, 2010-01-06 at 17:16 -0700, Chris Worley wrote: > 1) I'm seeing small block random writes (32KB and smaller) get better > performance over SRP than they do as a local drive. I'm guessing this > is async behavior: once the written data is on the wire, it's deemed > complete, and setting a sync flag would disable this. Is this > correct?
No, from the initiator point of view, the request is not complete until the target has responded to the command. > If not, any ideas why SRP random writes would be faster than > the same writes locally? I would guess deeper queue depths and more cache available on the target, especially if you are using a Linux-based SRP target. But it would only be a guess without knowing more about your setup. > 2) I'm seeing very poor sequential vs. random I/O performance (both > read and write) at small block sizes (random performs well, sequential > performance is poor). I'm using direct I/O and the noop scheduler on > the initiator, so there should be no coalescing. Coalescing on these > drives is not a good thing to do, as they are ultra low latency, and > much faster if the OS doesn't try to coalesce. Could anything in the > IB/SRP/SCST stack be trying to coalesce sequential data? Yes, if you have more requests outstanding than available queue depth -- ie queue backpressure/congestion -- even noop will merge sequential requests in the queue. You could avoid this by setting max_sectors_kb to the maximum IO size you wish the drive to see. Though, I'd be surprised if there was no benefit at all to the OS coalescing under congestion. > 3) In my iSCSI (tgt) results using the HCA as a 10G interface (not > IPoIB, but mlnx4_en), comparing this to the results of using the same > HCA as IB under SRP, I get much better results with SRP when > benchmarking the raw device, as you'd expect. This is w/ a drive that > does under 1GB/s. When I use MD to mirror that SRP or iSCSI device w/ > an identical local device, and benchmark the raw MD device, iSCSI gets > superior write performance and about equal read performance. Does > iSCSI/TGT have some special hook into MD devices that IB/SRP isn't > privy to? Are trying to achieve high IOPS or high bandwidth? I'm guessing IOPS from your other comments, but device-mapper (and I suspect MD as well) used to suffer from an internal limit on the max_sectors_kb -- you could have it set to 8 MB on the raw devices, but MD would end up restricting it to 512 KB. This is unlikely the problem if you are going for IOPS, but can play a factor in bandwidth. Then again, since the setup seems to be identical, I'm not sure it is your problem here either. :( Have you tried using the function tracer or perf tools found in recent kernels to follow the data path and find the hotspots? Dave -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
