All,

I am looking for a more complete understanding of how the two settings qos_prio_free and qos_threshold_rr function together.

My current understanding, which may be inaccurate, is the following:

*qos_prio_free**
*
This setting controls how much Lustre prioritizes free space (versus location for the sake of performance) in allocation. The higher this number, the more Lustre takes empty space on an OST into consideration for its allocation. When set to 100%, Lustre uses ONLY empty space as the deciding factor for writes.

*qos_threshold_rr**
*
This setting controls how much consideration should be given to QoS in allocation
The higher this number, the more QOS is taken into consideration.
When set to 100%, Lustre ignores the QoS variable and hits all OSTs equally

I'm looking for several answers:

1) Is my basic understanding of the above settings correct?

2) How does lustre deal with OSTs that are 100% full? I'm curious about this under two conditions.

2a) When you set qos_threshold_rr=100 -- meaning, go and hit all the OSTs the same amount.

On one of our 2.5.3 lustre filesystems, the allocator is not working (a known bug, but why it seems to be behaving fine on the other one, I couldn't say...) and so we have configured qos_threshold_rr=100. Since our OSTs are pretty dramatically unbalanced, it has happened that attempts to write to full OSTs have caused write failures. Data deletes have gotten us below 90% on all OSTs now, and while I can certainly take the fullest OSTs them out of write mode if that is needed, it would seem to me that lustre should, no matter what your qos_threshold_rr setting, treat OSTs that are 100% full differently, meaning, it should no longer attempt to write to them. In short, this seems like a bug to me... although, granted, I suppose if you are overriding the allocator, it's caveat user at that point.

2b) When you set qos_threshold_rr != 100 -- meaning, the allocator is working

On the other lustre 2.5.3 system, the system defaults (qos_prio_free=91%; qos_threshold_rr=17%) are hitting all the OSTs when I run my test*, so I have not changed them. Several of the OSTs in this file system are at 100%. I get that we are not seeing write failures because the allocator is not allocating to these OSTs as frequently, based on how full they are. But I know from my test that these OSTs are still in the mix... so that implies to me that it would be possible, although less likely, to see a write failure if a write stream is opened on one of the 100% OSTs. I'd love to be able to quantify that "less likely".

Basically, I guess my question is: is taking an OST out of write mode the only (or best) way of preventing the fs from attempting to write to it when it is nearly full?

Thanks,
Jessica

------------------------------

*To test file allocation on your lustre system, you can use this one-liner from a lustre client. USE IT IN ITS OWN, NEW DIRECTORY!

touch t.{1..2000}; lfs getstripe t.*|fgrep -A1 obdidx|fgrep -v obdidx|fgrep -v -- --|awk '{ print $1 }'|sort|uniq -c; rm -f t.*


--
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to