Hello people,

Lustre 1.8.0 on all servers / clients involved in this. OS is SLES 10  
SP2 with un-patched kernel on the clients. I however has put the same  
kernel revision downloaded from suse.com on the clients as the version  
used in the Lustre-patched MGS/MDS/OSS servers. File system is only  
several GBs, with ~500000 files. All inter-connections are through TCP.

We have some “manual” replication of an active lustre file system to a  
passive lustre file system. We have “sync” clients that just basically  
mount both file systems and run large sync jobs from the active Lustre  
to the passive Lustre. So far, so good (apart that it is quite a slow  
process). However my issue is that Lustre is rising memory so high  
that rsync cannot get enough RAM to finish its job before kswap kicks  
in and slows things down drastically.
Up to now, I have succeeded fine-tuning things using the following  
steps in my rsync script:
       ########
        umount /opt/lustre_a
        umount /opt/lustre_z
        mount /opt/lustre_a
        mount /opt/lustre_z
        for i in `ls /proc/fs/lustre/osc/*/max_dirty_mb`; do echo 4 > $i ; done
        for i in `ls /proc/fs/lustre/ldlm/namespaces/*/lru_max_age`; do echo  
30 > $i ; done
        for i in `ls /proc/fs/lustre/llite/*/max_cached_mb`; do echo 64 > $i ; 
done
        echo 64 > /proc/sys/lustre/max_dirty_mb
        lctl set_param ldlm.namespaces.*osc*.lru_size=100
        sysctl -w lnet.debug=0
       ########
What I still don't understand is that even when putting a max limit of  
a few MB of read-cache (max_cached_mb / max_dirty_mb) and putting the  
write-cache (lru_max_age ? is it correct ?) to a very limited number,  
it still sky-rise to several GBs in /proc/sys/lustre/mem_used ? And as  
soon as I un-mount the disks, it drops. The memused number however  
will not decrease even if the client remains idle for several days  
with no i/o from/to any lustre file systems. Note that cutting the  
rsync jobs in smaller but more numbered jobs is not helping. Unless  
I'd start un-mounting and re-mounting the lustre file systems between  
each job (which is nevertheless what I may have to plan if there is no  
further parameter which would help me) !

Any help/guidance/hint/... is very much appreciated.

Thank you,


Guillaume Demillecamps
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to