On 2011-01-15, at 4:18 AM, Claudio Baeza Retamal wrote:
> El 14-03-2011 22:05, Andreas Dilger escribió:
>> On 2011-01-14, at 3:57 PM, Claudio Baeza Retamal wrote:
>>> last month, I have configured lustre 1.8.5 over infiniband, before, I
>>> was using Gluster 3.1.2, performance was ok but reliability was wrong,
>>> when 40 or more applications requested at the same time for open a
>>> file, gluster servers bounced randomly the active connections from
>>> clients. Lustre has not this problem, but I can see others issues, for
>>> example, namd appears with "system cpu" around of 30%, hpl benchmark
>>> appears between 70%-80% of "system cpu", is too much high, with
>>> gluster, the system cpu was never exceeded 5%. I think, this is
>>> explained due gluster uses fuse and run in user space, but I am do not
>>> sure.
>>> I have some doubt:
>>>
>>> ¿why Lustre uses ipoib? Before, with gluster I do not use ipoib, I am
>>> thinking that ipoib module produces bad performance in infiniband and
>>> disturbs the infiniband native module.
>>
>> If you are using IPoIB for data then your LNET is configured incorrectly.
>> IPoIB is only needed for IB hostname resolution, and all LNET traffic can
>> use native IB with very low CPU overhead. Your /etc/modprobe.conf and mount
>> lines should be using {addr}@o2ib0 instead of {addr} or {addr}@tcp0.
>
> For first two weeks, I was using "options lnet networks="o2ib(ib0)", now, I
> am using "options lnet networks="o2ib(ib0),tcp0(eth0)" because I have a node
> without HCA card, in both case, the system cpu usage is the same, the compute
> node without infiniband is used to run matlab only.
>
> In the hpl benchmark case, my doubt is, why has a high system cpu usage? Is
> posible that LustreFS disturbs mlx4 infiniband driver and causes problems
> with MPI? hpl benchmark mainly does I/O for transport data over MPI, with
> glusterFS system cpu was around 5%, instead, since Lustre was configured
> system cpu is 70%-80% and we use o2ib(ib0) for LNET in modprobe.conf.
Have you tried disabling the Lustre kernel debug logs (lctl set_param debug=0)
and/or disabling the network data checksums (lctl set_param osc.*.checksums=0)?
Note that there is also CPU overhead in the kernel from copying data from
userspace to the kernel that is unavoidable for any filesystem, unless O_DIRECT
is used (which causes synchronous IO and has IO alignment restrictions).
> I have tried several options, following instruction from mellanox, in compute
> nodes I disable irqbalance and run smp_affinity script, but system cpu still
> so higher.
> Are there any tools to study lustre performance?
>
>>> It is posible to configure lustre to transport metada over ethernet and
>>> data over infiniband?
>> Yes, this should be possible, but putting the metadata on IB is much lower
>> latency and higher performance so you should really try to use IB for both.
>>
>>> For namd and hpl benchmark, is it normal to have system cpu to be so high?
>>>
>>> My configuration is the following:
>>>
>>> - Qlogic 12800-180 switch, 7 leaf (24 ports per leaf) and 2 spines (All
>>> ports have QDR, 40 Gbps)
>>> - 66 HCA mellanox connectX, two ports, QDR 40 Gbps (compute nodes)
>>> - 1 metadata server, 96 GB RAM DDR3 optimized for performance, two Xeon
>>> 5570, SAS 15K RPM hard disk in Raid 1, HCA mellanox connectX with two ports
>>> - 4 OSS with 1 OST of 2 TB in RAID 5 each one (8 TB in total). The all
>>> OSS have a Mellanox ConnectX with two ports
>> If you have IB on the MDS then you should definitely use {addr}@o2ib0 for
>> both OSS and MDS nodes. That will give you much better metadata performance.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Principal Engineer
>> Whamcloud, Inc.
>>
>>
>>
>>
>>
>
> regards
>
> claudio
>
>
Cheers, Andreas
--
Andreas Dilger
Principal Engineer
Whamcloud, Inc.
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss