Andreas

The file system has lru_max_age=9000000. I have been googling around to find out what this controls, but haven't found much. Is there documentation on how the memory management works with Lustre? I wonder what the lru actually means. How is it that 2 files on the same node are not controlled by the same lru mechanism, as SCR300's pages are being lru'ed out when they are clearly used more recently than any in SCRATCH?

Thanks

John


On 12/12/2016 6:59 PM, Dilger, Andreas wrote:
On Dec 12, 2016, at 15:50, John Bauer <[email protected]> wrote:
I'm observing some undesirable caching of OSC data in the system buffers.  This 
is a single node, single process application.  There are 2 files of interest, 
SCRATCH and SCR300,  both are scratch files with stripeCount=4.  The system has 
128GB of memory.  Lustre maxes out at about 59GB of memory used for caching.
SCRATCH,  About 22GB is written/read during the first 300 seconds of the run.  
No further activity to the file ( but remains open ) until about 18,700 seconds 
into the run when another 22GB is written/read.  Illustrated in the top frame 
of the first plot below.  In the bottom frame of the first plot is the amount 
of system cache used by each of the 4 OSC's associated with the file over the 
course of the run ( nearly identical, as would be expected ).  Note that each 
the OSC's retains its 5.5GB of memory even though nothing is happening to the 
file.
SCR300,  A 110GB file, written and repeatedly read between the times of the 
above SCRATCH file's I/O.

What is of interest it that while SCR300 is doing all its I/O, and its 
associated OSC's are fighting each other for caching memory, the 4 OSC's for 
the inactive file(SCRATCH) retain their 22GB of memory.  Why are the 4 OSC's 
for the inactive file exempt from giving up their memory?  It is very 
reproducible.
You don't mention what Lustre version you are using, which makes it hard
to comment specifically.  That said, you could try reducing the lock LRU
age, which was changed by default in the 2.8 or 2.9 release to 3900s
(65 minutes) instead of 36000s (10h) via:

         lctl set_param ldlm.namespaces.*.lru_max_age=3900000

(though check what your current setting is, since the units are in
"jiffies" (HZ) and that may differ depending on kernel compile options).

Cheers, Andreas

The application is MSC.Nastran, which has the capability to put the data for 
SCR300 inside of SCRATCH(increasing its size to 132GB).  If run in this mode, 
the caching behavior is much better behaved and the job runs in 11,500 seconds, 
versus 19,000.  Illustrated in 3rd plot below.  While this is a solution for 
this case, it is not a general solution.

Thanks

John
Plots for SCRATCH
<bfoimgfaenjmgmii.png>


Plots for SCR300

<mncccijbfkiekmmn.png>


Plots for SCR300 inside of SCRATCH

<adnondhpelpohhjf.png>
--
I/O Doctors, LLC
507-766-0378

[email protected]
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--
I/O Doctors, LLC
507-766-0378
[email protected]

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to