On Nov 8, 2012, at 11:19 AM, Andrey Korolyov <[email protected]> wrote: > On Thu, Nov 8, 2012 at 7:02 PM, Atchley, Scott <[email protected]> wrote: >> On Nov 8, 2012, at 10:00 AM, Scott Atchley <[email protected]> wrote: >> >>> On Nov 8, 2012, at 9:39 AM, Mark Nelson <[email protected]> wrote: >>> >>>> On 11/08/2012 07:55 AM, Atchley, Scott wrote: >>>>> On Nov 8, 2012, at 3:22 AM, Gandalf Corvotempesta >>>>> <[email protected]> wrote: >>>>> >>>>>> 2012/11/8 Mark Nelson <[email protected]>: >>>>>>> I haven't done much with IPoIB (just RDMA), but my understanding is >>>>>>> that it >>>>>>> tends to top out at like 15Gb/s. Some others on this mailing list can >>>>>>> probably speak more authoritatively. Even with RDMA you are going to >>>>>>> top >>>>>>> out at around 3.1-3.2GB/s. >>>>>> >>>>>> 15Gb/s is still faster than 10Gbe >>>>>> But this speed limit seems to be kernel-related and should be the same >>>>>> even in a 10Gbe environment, or not? >>>>> >>>>> We have a test cluster with Mellanox QDR HCAs (i.e. NICs). When using >>>>> Verbs (the native IB API), I see ~27 Gb/s between two hosts. When running >>>>> Sockets over these devices using IPoIB, I see 13-22 Gb/s depending on >>>>> whether I use interrupt affinity and process binding. >>>>> >>>>> For our Ceph testing, we will set the affinity of two of the mlx4 >>>>> interrupt handlers to cores 0 and 1 and we will not using process >>>>> binding. For single stream Netperf, we do use process binding and bind it >>>>> to the same core (i.e. 0) and we see ~22 Gb/s. For multiple, concurrent >>>>> Netperf runs, we do not use process binding but we still see ~22 Gb/s. >>>> >>>> Scott, this is very interesting! Does setting the interrupt affinity >>>> make the biggest difference then when you have concurrent netperf >>>> processes going? For some reason I thought that setting interrupt >>>> affinity wasn't even guaranteed in linux any more, but this is just some >>>> half-remembered recollection from a year or two ago. >>> >>> We are using RHEL6 with a 3.5.1 kernel. I tested single stream Netperf with >>> and without affinity: >>> >>> Default (irqbalance running) 12.8 Gb/s >>> IRQ balance off 13.0 Gb/s >>> Set IRQ affinity to socket 0 17.3 Gb/s # using the Mellanox script >>> >>> When I set the affinity to cores 0-1 _and_ I bind Netperf to core 0, I get >>> ~22 Gb/s for a single stream. >> > > Did you tried Mellanox-baked modules for 2.6.32 before that?
That came with RHEL6? No. Scott > >> Note, I used hwloc to determine which socket was closer to the mlx4 device >> on our dual socket machines. On these nodes, hwloc reported that both >> sockets were equally close, but a colleague has machines where one socket is >> closer than the other. In that case, bind to the closer socket (or to cores >> within the closer socket). >> >>> >>>>> We used all of the Mellanox tuning recommendations for IPoIB available in >>>>> their tuning pdf: >>>>> >>>>> http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf >>>>> >>>>> We looked at their interrupt affinity setting scripts and then wrote our >>>>> own. >>>>> >>>>> Our testing is with IPoIB in "connected" mode, not "datagram" mode. >>>>> Connected mode is less scalable, but currently I only get ~3 Gb/s with >>>>> datagram mode. Mellanox claims that we should get identical performance >>>>> with both modes and we are looking into it. >>>>> >>>>> We are getting a new test cluster with FDR HCAs and I will look into >>>>> those as well. >>>> >>>> Nice! At some point I'll probably try to justify getting some FDR cards >>>> in house. I'd definitely like to hear how FDR ends up working for you. >>> >>> I'll post the numbers when I get access after they are set up. >>> >>> Scott >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to [email protected] >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
