This sounds very much like a problem we saw before we changed the lru_size to a fixed size from dynamic.
-- Andrew -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of David Simas Sent: Monday, October 12, 2009 6:07 PM To: [email protected] Subject: [Lustre-discuss] Memory (?) problem with 1.8.1 Hello, We have a Lustre 1.8.1 file system about 60 TB in size running on RHEL 5 x86_64. (I can provide hardware details if anyone thinks they'd be relevant.) We are seeing memory problems after several days of sustained I/O into that file system. We are writing from a small number of clients (4 - 5) at an average rate of 50 MB/s, with peaks of 350 MB/s. We read all the data at least twice before deleting them. During this operation, we notice the value of "buffers" reported in '/proc/meminfo' on the OSSs involved increasing monotonically until it apparently take up all the system's memory - 32 GB. Then 'kswapd' starts consuming a large amount of CPU, the load increases (100+), and the system, including Lustre, slows to crawl and becomes quite useless. If we stop Lustre I/O at this point, 'kswapd' and the system load calm down, but the "buffers" value does not decrease. Any I/O on the system will then (dd if=/dev/urandom of=/tmp/test ...) will cause 'kswapd' to run away again. We have observed the monotonically increasing "buffers" condition with non-Lustre I/O on systems running the Lustre 1.8.1 kernel (2.6.18-128.1.14.el5_lustre.1.8.1), but we haven't gotten them to point where 'kswapd' goes wild. Has anyboy else seen anything like this? David Simas SLAC _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
