Re: [Lustre-discuss] Optimize parallel commpilation on lustre

Maxence Dunnewind Mon, 28 Jun 2010 09:05:06 -0700

Heya,
> If you are interested to do a tiny bit of hacking, it would be interesting to 
> do an experiment to see what kind of performance can be gotten in your 
> benchmark by a single client.  Currently, Lustre limits each client to a 
> single filesystem-modifying metadata operation at one time, in order to 
> prevent the clients from overwhelming the server, and to ensure that the 
> clients can recover the filesystem correctly in case of a server crash.
I just tested this. Before, I tried to do an out-of-tree build. My for clients
are using nfsroot, so I put the kernel source on it, then I mount lustre on
/mnt/lustre, and I compile on /mnt/lustrE/build (with make O=). The results
(without) your patch are interesting :
7m42 against 9m7 before with -j 4
4 min 51 against 5 min 34 with -j 8
3 min 27 againt 4 min 19 with -j 16
I also use -pipe as gcc option, to avoid temp files.


So, my first question is : could it be possible in some way to disable cache
coherency on some subdirectory ? If I know all the files in this directory will
be acceded in read only, I do not need coherency. It would permit to read the
files from lustre instead of nfs.

I then tried with your patch, not much difference :

4 min 43 againt 4 min 51 without it (-j 8)
7min 40 against 7 min 42 with -j 8
So it changes almost nothing :)

> I'm not sure if it makes a difference in your case or not, but increasing the 
> MDC RPCs in flight might also help performance.  Also, increasing the client 
> cache size and the number of IO RPCs may also help.  On the clients run:
> 
> lctl set_param *.*.max_rpcs_in_flight=64
> lctl set_param osc.*.max_dirty_mb=512
no change 

> You may also test running the make directly on the MDS with a local Lustre 
> mount to determine if the network latency is a significant factor in the 
> performance. If you are using Ethernet instead of IB the latency could be 
> hurting you, since kernel compiles are generally only doing a tiny amount of 
> work per file and then you need to send a few RPCs to open and read the next 
> file and the headers.  Some of this can be hidden by pre-reading all of the 
> files into the client caches (new machines should have enough RAM, about 1GB 
> or so), but the "open" operations still need to send an RPC to the MDS for 
> each file open, so running on the MDS (or with a low-latency network like IB) 
> may help compiles like this run more quickly.
we don't have IB set up atm, so I can not test with it. I will try directly on
the mds (so on only one node) to compare.

Regards,

Maxence
-- 
Maxence DUNNEWIND
Contact : [email protected]
Site : http://www.dunnewind.net
06 32 39 39 93
GPG : 18AE 61E4 D0B0 1C7C AAC9  E40D 4D39 68DB 0D2E B533

signature.asc
Description: Digital signature

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Optimize parallel commpilation on lustre

Reply via email to