On Nov 8, 2012, at 11:19 AM, Andrey Korolyov <[email protected]> wrote:

> On Thu, Nov 8, 2012 at 7:02 PM, Atchley, Scott <[email protected]> wrote:
>> On Nov 8, 2012, at 10:00 AM, Scott Atchley <[email protected]> wrote:
>> 
>>> On Nov 8, 2012, at 9:39 AM, Mark Nelson <[email protected]> wrote:
>>> 
>>>> On 11/08/2012 07:55 AM, Atchley, Scott wrote:
>>>>> On Nov 8, 2012, at 3:22 AM, Gandalf Corvotempesta 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>>> 2012/11/8 Mark Nelson <[email protected]>:
>>>>>>> I haven't done much with IPoIB (just RDMA), but my understanding is 
>>>>>>> that it
>>>>>>> tends to top out at like 15Gb/s.  Some others on this mailing list can
>>>>>>> probably speak more authoritatively.  Even with RDMA you are going to 
>>>>>>> top
>>>>>>> out at around 3.1-3.2GB/s.
>>>>>> 
>>>>>> 15Gb/s is still faster than 10Gbe
>>>>>> But this speed limit seems to be kernel-related and should be the same
>>>>>> even in a 10Gbe environment, or not?
>>>>> 
>>>>> We have a test cluster with Mellanox QDR HCAs (i.e. NICs). When using 
>>>>> Verbs (the native IB API), I see ~27 Gb/s between two hosts. When running 
>>>>> Sockets over these devices using IPoIB, I see 13-22 Gb/s depending on 
>>>>> whether I use interrupt affinity and process binding.
>>>>> 
>>>>> For our Ceph testing, we will set the affinity of two of the mlx4 
>>>>> interrupt handlers to cores 0 and 1 and we will not using process 
>>>>> binding. For single stream Netperf, we do use process binding and bind it 
>>>>> to the same core (i.e. 0) and we see ~22 Gb/s. For multiple, concurrent 
>>>>> Netperf runs, we do not use process binding but we still see ~22 Gb/s.
>>>> 
>>>> Scott, this is very interesting!  Does setting the interrupt affinity
>>>> make the biggest difference then when you have concurrent netperf
>>>> processes going?  For some reason I thought that setting interrupt
>>>> affinity wasn't even guaranteed in linux any more, but this is just some
>>>> half-remembered recollection from a year or two ago.
>>> 
>>> We are using RHEL6 with a 3.5.1 kernel. I tested single stream Netperf with 
>>> and without affinity:
>>> 
>>> Default (irqbalance running)   12.8 Gb/s
>>> IRQ balance off                13.0 Gb/s
>>> Set IRQ affinity to socket 0   17.3 Gb/s   # using the Mellanox script
>>> 
>>> When I set the affinity to cores 0-1 _and_ I bind Netperf to core 0, I get 
>>> ~22 Gb/s for a single stream.
>> 
> 
> Did you tried Mellanox-baked modules for 2.6.32 before that?

That came with RHEL6? No.

Scott

> 
>> Note, I used hwloc to determine which socket was closer to the mlx4 device 
>> on our dual socket machines. On these nodes, hwloc reported that both 
>> sockets were equally close, but a colleague has machines where one socket is 
>> closer than the other. In that case, bind to the closer socket (or to cores 
>> within the closer socket).
>> 
>>> 
>>>>> We used all of the Mellanox tuning recommendations for IPoIB available in 
>>>>> their tuning pdf:
>>>>> 
>>>>> http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf
>>>>> 
>>>>> We looked at their interrupt affinity setting scripts and then wrote our 
>>>>> own.
>>>>> 
>>>>> Our testing is with IPoIB in "connected" mode, not "datagram" mode. 
>>>>> Connected mode is less scalable, but currently I only get ~3 Gb/s with 
>>>>> datagram mode. Mellanox claims that we should get identical performance 
>>>>> with both modes and we are looking into it.
>>>>> 
>>>>> We are getting a new test cluster with FDR HCAs and I will look into 
>>>>> those as well.
>>>> 
>>>> Nice!  At some point I'll probably try to justify getting some FDR cards
>>>> in house.  I'd definitely like to hear how FDR ends up working for you.
>>> 
>>> I'll post the numbers when I get access after they are set up.
>>> 
>>> Scott
>>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to