On 2011-01-15, at 4:18 AM, Claudio Baeza Retamal wrote:
> El 14-03-2011 22:05, Andreas Dilger escribió:
>> On 2011-01-14, at 3:57 PM, Claudio Baeza Retamal wrote:
>>> last month, I have configured lustre 1.8.5 over infiniband, before, I
>>> was using  Gluster 3.1.2, performance was ok but reliability was wrong,
>>> when 40 or more applications requested at the same time  for open a
>>> file, gluster servers  bounced randomly the active connections from
>>> clients. Lustre has not this problem, but I can see others issues, for
>>> example, namd appears with "system cpu"  around of 30%,  hpl benchmark
>>> appears between  70%-80% of "system cpu", is too much high, with
>>> gluster, the system cpu was never exceeded 5%. I think, this is
>>> explained due gluster uses fuse and run in user space, but I am do not
>>> sure.
>>> I have some doubt:
>>> 
>>> ¿why Lustre uses ipoib? Before, with gluster  I do not use ipoib, I am
>>> thinking  that ipoib module produces bad performance in infiniband and
>>> disturbs the infiniband native module.
>> 
>> If you are using IPoIB for data then your LNET is configured incorrectly.  
>> IPoIB is only needed for IB hostname resolution, and all LNET traffic can 
>> use native IB with very low CPU overhead.  Your /etc/modprobe.conf and mount 
>> lines should be using {addr}@o2ib0 instead of {addr} or {addr}@tcp0.
> 
> For first two weeks, I was using "options lnet networks="o2ib(ib0)", now, I 
> am using "options lnet networks="o2ib(ib0),tcp0(eth0)" because I have a node 
> without HCA card, in both case, the system cpu usage is the same, the compute 
> node without infiniband is used to run matlab only.
> 
> In the hpl benchmark case, my doubt is, why has a high system cpu usage?   Is 
> posible that LustreFS disturbs  mlx4 infiniband driver and causes problems 
> with  MPI?  hpl benchmark mainly does I/O for transport data over MPI, with 
> glusterFS system cpu was around 5%, instead, since  Lustre  was configured 
> system cpu is 70%-80% and we use  o2ib(ib0) for LNET in modprobe.conf.

Have you tried disabling the Lustre kernel debug logs (lctl set_param debug=0) 
and/or disabling the network data checksums (lctl set_param osc.*.checksums=0)?

Note that there is also CPU overhead in the kernel from copying data from 
userspace to the kernel that is unavoidable for any filesystem, unless O_DIRECT 
is used (which causes synchronous IO and has IO alignment restrictions).

> I have tried several options, following instruction from mellanox, in compute 
> nodes I disable irqbalance and run smp_affinity script, but system cpu still 
> so higher.
> Are there any tools to study lustre performance?
> 
>>> It is posible to configure lustre to  transport metada over ethernet and
>>> data over infiniband?
>> Yes, this should be possible, but putting the metadata on IB is much lower 
>> latency and higher performance so you should really try to use IB for both.
>> 
>>> For namd and hpl benchmark, is  it normal to have system cpu to be so high?
>>> 
>>> My configuration is the following:
>>> 
>>> - Qlogic 12800-180 switch, 7 leaf (24 ports per  leaf) and 2 spines (All
>>> ports have QDR, 40 Gbps)
>>> - 66 HCA mellanox connectX, two ports, QDR 40 Gbps (compute nodes)
>>> - 1 metadata server, 96 GB RAM DDR3 optimized for performance, two Xeon
>>> 5570, SAS 15K RPM  hard disk in Raid 1, HCA mellanox connectX with two ports
>>> - 4 OSS with 1 OST of 2 TB in RAID 5 each one (8 TB in total). The all
>>> OSS have a Mellanox ConnectX with two ports
>> If you have IB on the MDS then you should definitely use {addr}@o2ib0 for 
>> both OSS and MDS nodes.  That will give you much better metadata performance.
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Principal Engineer
>> Whamcloud, Inc.
>> 
>> 
>> 
>> 
>> 
> 
> regards
> 
> claudio
> 
> 


Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.



_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to