Do you mean you've never hit client eviction or dirty page discard before?
What was your previous Lustre version?
Dirty page discard warning exists for a long time, since Lustre 2.4.

Eviction happens for lots of reason. Evictions just mean this client did not 
respond in 100 sec to this OSS. It could be due to network being overloaded, 
hardware issue on one of these hosts, client not responding due to CPU being 
overloaded, or indeed a bug. You should first try to understand why these 
eviction happened.
Verify the server and client load at that time (CPU, network, etc…). Verify the 
impacted files and the application accessing them. What's the application I/O 
pattern? Is it putting a strong pressure on these files? File name could be 
obtained using the FID and 'lfs fid2path MOUNT_POINT FID'.


Aurélien

De : lustre-discuss <lustre-discuss-boun...@lists.lustre.org> au nom de 肖正刚 
<guru.nov...@gmail.com>
Date : dimanche 30 août 2020 à 07:41
À : Andreas Dilger <adil...@whamcloud.com>
Cc : lustre-discuss <lustre-discuss@lists.lustre.org>
Objet : RE: [EXTERNAL] [lustre-discuss] some clients dmesg filled up with 
"dirty page discard"


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi, Andreas,
Thanks for your reply.
Maybe this is a bug?
We never hit this before update client to 2.12.5

Andreas Dilger <adil...@whamcloud.com<mailto:adil...@whamcloud.com>> 
于2020年8月29日周六 下午6:37写道:
On Aug 25, 2020, at 17:42, 肖正刚 
<guru.nov...@gmail.com<mailto:guru.nov...@gmail.com>> wrote:

no, on oss we found only the client who reported " dirty page discard  " being 
evicted.
we hit this again last night, and on oss we can see logs like:
"
[Tue Aug 25 23:40:12 2020] LustreError: 
14278:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired 
after 100s: evicting client at 10.10.3.223@o2ib  ns: 
filter-public1-OST0000_UUID lock: ffff9f1f91cba880/0x3fcc67dad1c65842 lrc: 
3/0,0 mode: PR/PR res: [0xde2db83:0x0:0x0].0x0 rrc: 3 type: EXT 
[0->18446744073709551615] (req 0->270335) flags: 0x60000400020020 nid: 
10.10.3.223@o2ib remote: 0xd713b7b417045252 expref: 7081 pid: 25923 timeout: 
21386699 lvb_type: 0

It isn't clear what the question is here.  The "dirty page discard" message 
means that unwritten data from the client was discarded because the client was 
evicted and the lock covering this data was revoked by the server because the 
client was not responsive.


Anymore , we exec lfsck on all servers,  result is

There is no need for LFSCK in this case.  The file data was not written, but a 
client eviction does not result in the filesystem becoming inconsistent.

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud





_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to