We have a cluster of diskless nodes called nmpost running RHEL7.8, kernel 3.10.0-1127.13.1.el7.x86_64, Lustre version 2.12.5. Checking the md5sum of a specific file on Lustre shows that most hosts get the correct result,
nmpost061 010585dfa7a66ae60b887a843056a4ec /lustre/aoc/cluster/pipeline/dsoc-prod/workspaces/sbin/casa_envoy but a few get different results nmpost073 e4c7c2eceec068ab061151866e2a0d64 /lustre/aoc/cluster/pipeline/dsoc-prod/workspaces/sbin/casa_envoy nmpost077 61d1d2bc7a86b53334d005a72603d8a1 /lustre/aoc/cluster/pipeline/dsoc-prod/workspaces/sbin/casa_envoy nmpost081 e4c7c2eceec068ab061151866e2a0d64 /lustre/aoc/cluster/pipeline/dsoc-prod/workspaces/sbin/casa_envoy This casa_envoy file was updated about 10 days ago and it looks like the hosts that see the wrong md5sum are seeing a previous version of this file. Either rebooting or running "echo 3 > /proc/sys/vm/drop_caches" on one of these hosts causes it to see the correct md5sum (010585dfa7a66ae60b887a843056a4ec). So it seems that the Linux page cache is not getting updated with the new version of this file even after running md5sum on it multiple times. Any ideas on why this is? Is there a known issue between Lustre and Linux page cache? Thanks _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
