Re: [lustre-discuss] problem getting high performance output to single file

David A. Schneider Tue, 19 May 2015 12:48:06 -0700

We do use checksums, but can't turn it off. It know we've measured someperformance penalty with checksums. I'll check about configuring lustreclients to to use RDMA. We ran into something similar where our MPIprograms were not taking advantage of the infini-band - we noticed muchslower message passing then we expected - it sounds like there is asimilar thing we can do with lustre, but I guess the locking is the mainissue. All our compute nodes are currently running red hat 5 and itdoesn't look like lustre 2.6 was tested with rhel5, but we have beentalking about moving everything to at least rhel6, maybe rhel7, sothere's hope, Thanks for the help!


best,


David

On 05/19/15 11:10, Patrick Farrell wrote:

Ah.  I think I know what¹s going on here:

In Lustre 2.x client versions prior to 2.6, only one process on a given
client can write to a given file at a time, regardless of how the file is
striped.  So if you are writing to the same file, there will be little to
no benefit of putting an extra process on the same node.

A *single* process on a node could benefit, but not the split you¹ve
described.

The details, which are essentially just that a pair of per-file locks are
used by any individual process writing to a file, are here:
https://jira.hpdd.intel.com/browse/LU-1669


On 5/19/15, 12:59 PM, "Mohr Jr, Richard Frank (Rick Mohr)" <[email protected]>
wrote:

On May 19, 2015, at 1:44 PM, Schneider, David A.
<[email protected]> wrote:

Thanks for the suggestion! When I had each rank run on a separate
compute node/host, I saw parallel performance (4 seconds for the 6GB of
writing). When I ran the MPI job on one host (the hosts have 12 cores,
by default we pack ranks onto as few hosts as possible), things happened
serially, each rank finished about 2 seconds after a different rank.

Hmm. That does seem like there is some bottleneck on the client side that
is limiting the throughput from a single client.  Here are some things
you could look into (although they might require more tinkering than you
have permission to do):

1) Based on your output from ³lctl list_nids², it looks like you are
running IP-over-IB.  Can you configure the clients to use RDMA?  (They
would have nids like x.x.x.x@o2ib.)

2) Do you have the option of trying a newer client version?  Earlier
lustre versions used a single-thread ptlrpcd to manage network traffic,
but newer versions have a multi-threaded implementation.  You may need to
compare compatibility with the Lustre version running on the servers
though.

3) Do you gave checksums disabled?  Try running "lctl get_param
osc.*.checksums².  If the values are ³1², then checksums are enabled
which can slow down performance.  You could try setting the value to ³0²
to see if that helps.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] problem getting high performance output to single file

Reply via email to