Unless you have huge directories, you may not see any improvement from DNE, and 
it may hurt performance because striped directories have more overhead when 
they are first created.

DNE is mostly useful when a single MDS is overloaded by many clients, but with 
the small IO workload here that may not be the case.

Also, you would likely benefit from IB networking, which is lower latency 
compared to TCP.

Cheers, Andreas

On May 29, 2018, at 17:26, meng ding 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

We are in the process of helping a client to evaluate share file system 
solutions for their distributed build use case. Our client is a company with 
hundreds of developers around the globe doing system software development 24 
hours. At anytime, there could be many builds running, including interactive 
builds (mostly incremental build), or batch builds (mostly regression tests).

The size of the build is very big. For example, a single full build may have 
more than 6000 build tasks (i.e., make rules) that can all be run in parallel. 
Each build task takes about 6 seconds on average to run. So a sequential build 
using 1 CPU (e.g., make -j 1)  will take 10 hours to complete.

Our client is using a distributed build software to run the build across a 
cluster of build hosts. Think of (make -j N), only the build tasks are running 
simultaneously on multiple hosts, instead of one. Obviously for this to work, 
they need to have a shared file system with good performance. Our client is 
currently using NFS on NetApp, which most of the time provides good 
performance, but at a very high cost. With this combination, our client is able 
to complete the above mentioned build in less than 5 minutes in a build cluster 
with about 30 build hosts (25 cores per host). Another advantage of using a 
cluster of build hosts is to accommodate many builds from multiple developers 
at the same time, with each developer dynamically assigned a fair share of the 
total cores in the build cluster at any given time based on the resource 
requirement of each build.

The distributed build use case has the following characteristics:

  *   Mostly very small source files (tens of thousands of them) less than 16K 
to start with.
  *   Source files are read-only. All reads are sequential.
  *   Source files are read repetitively (e.g., the header files). So it can 
benefit hugely from client-side caching.
  *   Intermediate object files, libraries, or binary files are small to medium 
in size, the biggest binary generated is about several hundred megabytes.
  *   Binary/object files are generated by small random writes.
  *   There is NO concurrent/shared access to the same file. Each build task 
generates its own output file.

With this use case in mind, we are trying to explore alternative solutions to 
NFS on NetApp with the goal to achieve comparable performance with reduced 
cost. So far, we have done some benchmark with Lustre on distributed build of 
GCC 8.1 on AWS, but the performance is lagging quite a bit behind even kernel 
NFS:

Lustre Setup

Lustre Server

  *   2 MDS each has m5.2xlarge instance (8 vCPUS, 32GiB Mem, up to 10Gb 
network), backed by 80 GiB SSD formated with LDISKFS.
  *   DNE phase II (striped directory) is enabled.
  *   No data striping is enabled because most files are small.
  *   4 OSS each has m5.xlarge instance (4 vCPUS, 16GiB Mem, up to 10Gb 
network) , backed by 40 GiB SSD formated with LDISKFS.
Build cluster
30 build hosts m5.xlarge, 120 CPUs in total all mounting the same Lustre volume

  *   The following is configured on all build hosts:
mount -t lustre -o localflock …
lctl set_param osc./*.checksums=0
lctl set_param osc./*.max_rpcs_in_flight=32
lctl set_param osc./*.max_dirty_mb=128

Test and results
Running distributed build of GCC 8.1 in the Lustre mount across the build 
cluster:

Launching 1 build only:

  *   Takes on average 17 minutes 45 seconds to finish.

Launching 20 builds at the same time all sharing the same build cluster:

  *   Takes on average 46 minutes to finish for each build.

By the way, we have tried the Data-on-MDT feature since we are using Lustre 
2.11, but we did not observe performance improvement.

Kernel NFS Setup

NFS Server
1 NFS server m5.2xlarge (8 vCPUS, 32GiB Mem, up to 10Gb network), backed by 300 
GiB SSD formatted with XFS

Build cluster
30 build hosts m5.xlarge, 120 CPUs in total all mounting the same NFS volume 
using NFS v3 protocol.

Test and results
Running distributed build of GCC 8.1 in the NFS mount across the build cluster:
Launching 1 build only:

  *   Takes on average 16 minutes 36 seconds to finish. About 1 minute faster 
than Lustre.

Launching 20 builds at the same time all sharing the same build cluster:

  *   Takes on average 38 minutes to finish for each build. About 8 minutes 
faster than Lustre.


So our question to the Lustre experts, given the distributed build use, case do 
you suggest anything else that we can try to potentially improve the 
performance further?

Thanks,
ading
_______________________________________________
lustre-discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to