Mirko Benz wrote:
Hi Nathaniel,
When using 6 clients (IOR test) with 6 OSTs on a single OSS -- why
does Lustre use only 5 OSTs (4 for 1 client, 1 for two clients, 1
empty OST)? The FS is empty -- round robin should be used. The OST
size is 100 GB, file size is 5 GB -- there is no difference by more
than 20%.
> OSTCOUNT=6 sh llmount.sh
> cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd
> for FILE in `seq 1 60`; do
cp /etc/termcap /mnt/lustre/file$FILE
done
> ../utils/lfs getstripe /mnt/lustre | grep 0x
0 2 0x2 0
1 2 0x2 0
2 2 0x2 0
3 2 0x2 0
4 2 0x2 0
5 2 0x2 0
0 3 0x3 0
2 3 0x3 0
3 3 0x3 0
4 3 0x3 0
5 3 0x3 0
0 4 0x4 0
1 3 0x3 0
3 4 0x4 0
4 4 0x4 0
5 4 0x4 0
0 5 0x5 0
1 4 0x4 0
2 4 0x4 0
4 5 0x5 0
5 5 0x5 0
0 6 0x6 0
1 5 0x5 0
2 5 0x5 0
3 5 0x5 0
5 6 0x6 0
0 7 0x7 0
1 6 0x6 0
2 6 0x6 0
3 6 0x6 0
4 6 0x6 0
0 8 0x8 0
1 7 0x7 0
2 7 0x7 0
3 7 0x7 0
4 7 0x7 0
5 7 0x7 0
1 8 0x8 0
2 8 0x8 0
3 8 0x8 0
4 8 0x8 0
5 8 0x8 0
0 9 0x9 0
2 9 0x9 0
3 9 0x9 0
4 9 0x9 0
5 9 0x9 0
0 10 0xa 0
1 9 0x9 0
3 10 0xa 0
4 10 0xa 0
5 10 0xa 0
0 11 0xb 0
1 10 0xa 0
2 10 0xa 0
4 11 0xb 0
5 11 0xb 0
0 12 0xc 0
1 11 0xb 0
2 11 0xb 0
You can see from these results:
1. The object count on each OST is roughly the same (10 objects
(starting with 0x2 ending with 0xb)
2. The objects are created in order (0-5)
3. After every ostcount+1 objects we skip an OST. This causes our
"starting point" to precess around, eliminating some degenerate cases
where applications that create very regular file creation/striping
patterns would have preferentially used a particular OST in the sequence.
I can only suggest that if you want very fine control over where files
are placed, that you use the 'lfs setstripe' command and set explicit
starting OSTs.
If you have a simple reproducer I would be glad to look at the results.
Regards,
Mirko
Nathaniel Rutman schrieb:
Mirko Benz wrote:
Hi,
We are testing 1.6 beta7 over IB. On a test setup (1 MDS, 1 OSS with
6 OSTs (RAID 5 each 5 drives)) we observer uneven load among the
OSTs. Testing from one to 5 clients Lustre schedules evenly (one OST
per client). With client count > 5 sometimes one OST is not used at
all (e.g. 6, 9 clients) or the utilisation is not as expected. The
FS is otherwise empty. We used IOR for testing.
There is never a 1:1 mapping between clients and OSTs. A round-robin
algorithm is used for OST stripe selection until the OST free space
differs by more than 20%. However, depending on how big the files
actually are, some stripes may be mostly empty and some full. For a
more complete explanation stripe assignments, see
http://arch.lustre.org/index.php?title=Feature_Free_Space_Management
The behavior can be reproduced. Uneven OST utilisation leads to
lower than possible performance. How can we achieve better
distribution over the OSTs without manual assignment?
Is there a setting to have a round-robin scheduling for the OST to use?
As explained above.
---
Stripe setting:
We want to have very high performance to a single client by striping
over 6 OSTs.
What parameters should be adjusted to achieve optimal performance?
Set the default stripe count to 6:
lctl> conf_param <fsname>-MDT0000.lov.stripecount=6
Thanks,
Mirko
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss