Hi all, OS=Redhat 7.4 Lustre Version: Intel® Manager for Lustre* software 4.0.3.0 İnterconnect: Mellanox OFED, ConnectX-5 72 OST over 6 OSS with HA 1mdt and 1 mgt on 2 MDS with HA
Lustre servers fine tuning parameters: lctl set_param timeout=600 lctl set_param ldlm_timeout=200 lctl set_param at_min=250 lctl set_param at_max=600 lctl set_param obdfilter.*.read_cache_enable=1 lctl set_param obdfilter.*.writethrough_cache_enable=1 lctl set_param obdfilter.lfs3test-OST*.brw_size=16 Lustre clients fine tuning parameters: lctl set_param osc.*.checksums=0 lctl set_param timeout=600 lctl set_param at_min=250 lctl set_param at_max=600 lctl set_param ldlm.namespaces.*.lru_size=2000 lctl set_param osc.*OST*.max_rpcs_in_flight=256 lctl set_param osc.*OST*.max_dirty_mb=1024 lctl set_param osc.*.max_pages_per_rpc=1024 lctl set_param llite.*.max_read_ahead_mb=1024 lctl set_param llite.*.max_read_ahead_per_file_mb=1024 Mountpoint stripe count:72 stripesize:1M I have a 2Pb lustre filesystem, In the benchmark tests i get the optimum values for read and write, but when i start a concurrent I/O operation, second job throughput stays around 100-200Mb/s. I have tried lovering the stripe count to 36 but since the concurrent operations will not occur in a way that keeps OST volume inbalance, i think that its not a good way to move on, secondly i saw some discussion about turning off flock which ended up unpromising. As i check the stripe behaviour, first operation starts to use first 36 OST when a second job starts during a first job, it uses second 36 OST But when second job starts after 1st job it uses first 36 OST's which causes OST unbalance. Is there a round robin setup that each 36 OST pair used in a round robin way? And any kind of suggestions are appreciated. Best regards.
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
