Thank you Andreas, appreciate your kind and clear explanation.

I set 3 dirs, test1, test2 and test3. Each with different setstripe, test1 with 
-S 0 -c 1, test2 with -S 0 -c -1 and test3 with -S 16M -c -1. But the result is 
about the same. I didn't try PFL yet.


I will just use file-per-process output for now.


Cheers,

Kal

________________________________
From: Andreas Dilger <[email protected]>
Sent: Wednesday, October 10, 2018 2:21:15 PM
To: Kal Alfizah
Cc: [email protected]
Subject: Re: [lustre-discuss] Writing to a single big file is slower

On Oct 10, 2018, at 15:01, Kal Alfizah <[email protected]> wrote:
>
> Hello,
>
> Doing IOR on single node to lustre fs. And notice write to a big single file 
> is slower. I would think write to many small files will be slower. Any ideas 
> why is it? And if there is any lustre setting able to fix it. It's 
> lustre-2.10.4.
>
> # mpirun -np 32  /temp/ior/bin/ior -a POSIX -C -v -w -k -F -i 1 -t 1m -b 8G 
> -o /mnt/lustrefs/begdon/test2/eachfile-256G
> ...
> Max Write: 4495.70 MiB/sec (4714.08 MB/sec)
> ...
>
> # mpirun -np 32 /temp/ior/bin/ior -a POSIX -C -v -w -k -i 1 -t 1m -b 8G -o 
> /mnt/lustrefs/begdon/test2/bigfile-256G
> ...
> Max Write: 1331.97 MiB/sec (1396.67 MB/sec)
> ...

Kal,
this is because the file-per-process output does not have any contention 
between client threads, which means each file gets a single LDLM lock and the 
client writes a contiguous stream of data to the one file.  Creating only 32 
files has no noticeable overhead, so this is not a factor in the performance.  
Once there are many thousands/millions of files the file creation overhead will 
become more significant.

The shared-single-file output has to contend between threads, so there is LDLM 
locking overhead between threads/nodes.  If all of the threads are on a single 
client, then it can also potentially cause issues where the RPCs are not formed 
properly if there is too much dirty data on the client, but it is spread around 
the file.

That said, I wouldn't expect the difference to be so large.  Is it possible 
that the shared-single-file case is only using a single OST stripe for the 
output?  With Lustre 2.10+ you can create a progressive file layout (PFL) that 
will distribute the IO across OSTs if the file gets larger.  Something like:

client$ lfs setstripe -E1G -c1 -E16G -c4 -E-1 -c-1 /mnt/lustrefs/begdon/test2

which will use a single stripe below 1GB, 4 stripes up to 16GB, and fully 
striped after 16GB (you can change these values arbitrarily per file or 
directory).  That will ensure that small (< 1GB) files do not have much 
overhead, but very large files can use the full IO bandwidth.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to