Linux VM use-once mechanisms don't seem to work. Simple scenario like streaming a file much greater than physical RAM size should be identified to avoid trashing the page cache with useless data.
I know the VM cannot predict the future or assume anything about the user's intent. But this workload is simple and common, it should be detected and better handled. Test case: Linux 2.6.20-16-lowlatency SMP PREEMPT x86_64 (also tried on 2.6.23-rc1) - A file of 1/3 the RAM size is created, mapped and frequently accessed (4 times). - The test is run multiple times (4 total) to time it's execution. - After the first run, other runs take much less time, because the file is cached. - A previously created file, 4 times the size of the RAM, is read or copied. - The test is re-run (2 times) to time it's execution. To test: $ make # ./use-once-test.sh Some big files will be created in your /tmp. They don't get erased after the test to speedup multiple runs. Results: - The test execution time greatly increase after reading or copying the large file. - Frequently used data got kick out of the page cache and replaced with useless read once data. - Both the read only and copy (read + write) cases don't work. I believe this clearly illustrate the slowdowns I experience after I copy large files around my system. All applications on my desktop are jerky for some moments after that. Watching a DVD is another example. Base test: 1st run: 0m8.958s 2nd run: 0m3.442s 3rd run: 0m3.452s 4th run: 0m3.443s Reading a large file test: 1st run: 0m8.997s 2nd run: 0m3.522s `/tmp/large_file' -> `/dev/null' 3rd run: 0m8.999s <<< page cache trashed 4th run: 0m3.440s Copying (using cp) a large file test: 1st run: 0m8.979s 2nd run: 0m3.442s `/tmp/large_file' -> `/tmp/large_file.copy' 3rd run: 0m13.814s <<< page cache trashed 4th run: 0m3.455s Copying (using fadvise_cp) a large file test: 1st run: 0m9.018s 2nd run: 0m3.444s Copying large file... 3rd run: 0m14.024s <<< page cache trashed 4th run: 0m3.449s Copying (using splice-cp) a large file test: 1st run: 0m8.977s 2nd run: 0m3.442s Copying large file... 3rd run: 0m14.118s <<< page cache trashed 4th run: 0m3.456s Possible solutions: Various patches to fix the use-once mechanisms were discussed in the past. Some more that 6 years ago and some more recently. http://lwn.net/2001/0726/a/2q.php3 http://lkml.org/lkml/2005/5/3/6 http://lkml.org/lkml/2006/7/17/192 http://lkml.org/lkml/2007/7/9/340 http://lkml.org/lkml/2007/7/21/219 (*1) (*1) I have tested Peter's patch with some success. It fix the read case, but no the copy case. Results: http://lkml.org/lkml/2007/7/24/527 Test programs and batch files are attached. - Eric
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int in;
int out;
int pagesize;
void *buf;
off_t pos;
if (argc != 3) {
printf("Usage: %s <src> <dest>\n", argv[0]);
return EXIT_FAILURE;
}
in = open(argv[1], O_RDONLY, 0);
out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0666);
posix_fadvise(in, 0, 0, POSIX_FADV_SEQUENTIAL);
posix_fadvise(out, 0, 0, POSIX_FADV_SEQUENTIAL);
pagesize = getpagesize();
buf = malloc(pagesize);
pos = 0;
for (;;) {
ssize_t count;
count = read(in, buf, pagesize);
if (!count || count == -1)
break;
write(out, buf, count);
/* right usage pattern? */
posix_fadvise(in, pos, count, POSIX_FADV_NOREUSE);
posix_fadvise(out, pos, count, POSIX_FADV_NOREUSE);
pos += count;
}
free(buf);
close(in);
close(out);
return EXIT_SUCCESS;
}
all: gcc fadvise_cp.c -o fadvise_cp gcc working_set_simul.c -o working_set_simul
use-once-test.sh
Description: application/shellscript
#include <fcntl.h>
#include <memory.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int fd;
off_t size;
char *mapping;
unsigned r;
unsigned i;
if (argc != 2) {
printf("Usage: %s <file>\n", argv[0]);
return EXIT_FAILURE;
}
fd = open(argv[1], O_RDONLY, 0);
size = lseek(fd, 0, SEEK_END);
mapping = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
/* access (read) the file a couple of times*/
for (r = 0; r < 4; r++) {
for (i = 0; i < size; i++) {
char t = mapping[i];
}
}
munmap(mapping, size);
close(fd);
return EXIT_SUCCESS;
}

