/*
  Hi again, one of our developers found YET ANOTHER hideous
kernel bug in 2.4.17-2-timer....

---------------------------------------------------------------------
From: Christopher Neufeld <[EMAIL PROTECTED]>

Hello,

We have located another serious kernel bug in s390, this one apparently
involving the treatment of the dirty attribute of memory-mapped pages.  The
particular test case I have written is designed deliberately to mimic the
behaviour of a dynamically-linked library, whose text segment has to be
rewritten in order to resolve external and internal links.  The linker
performs the following set of operations:

open(O_RDONLY)   the file, and determine the size of the mapping
mmap(..., PROT_READ|PROT_EXEC, MAP_PRIVATE, ...)  the file
...(perform some setup relating to other segments in the .so)...
close()          the file
mprotect(..., PROT_READ|PROT_WRITE);
...(do some writes to the mapped pages)...
mprotect(..., PROT_READ|PROT_EXEC);

at this point, we're ready to run.  Note that the process has mapped in the
shared object with copy-on-write, and has modified some parts of it.  These
modified pages should have been tagged dirty, so that if the system has to
page them out, it will send them to Linux' page (swap) space.

The following program reproduces this step, omitting the setup of other
segments in the fictitious shared object file.  It creates a 512 kilobyte
file in /tmp and maps it, modifies every fourth page, then waits for the
user to hit <ENTER>.  After the user has pressed ENTER, it scans all of the
pages it modified, and confirms that they retain their modified values.

Compile this program, and run it on a normally-loaded system.  It should
complete without complaining.  Repeat this, but on a system with memory
pressure.  I have done this by running the program, and then, while it is
waiting for me to press ENTER, I enter the command (as root):
dd if=/dev/dasda1 of=/dev/null bs=160M count=1

which requires the kernel to load the first 160 megabytes of the primary
DASD volume into Linux' virtual memory.  If the virtual machine's physical
memory is not vastly larger than 160 MB, this will generate page pressure,
and the kernel will drop pages, preferably clean pages.

When the "dd" operation completes, something which takes a couple of
minutes on our machine, I press ENTER.  Essentially every dirty page in the
mapping is revealed to have been discarded, with the kernel going back to
the disk image to refresh the mapping, even for those pages which ought to
have been tagged dirty and paged out.

Note that this behaviour isn't manifested on programs which link only
against libc, because the library is compiled as position-independent code
with internal links resolved at compile-time, so the linker is not required
to modify any pages, and libc is not corrupted internally under the same
circumstances.


Here, now, is the test program
*/

#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <signal.h>

#define MAPFILE "/tmp/mapped"

#define PAGESIZE 4096

#define PAGES 128
#define PAGE_SKIP 4

int main(void)
{
  int fd;
  unsigned char *buffer;
  pid_t mypid;
  int i;
  char pdata[PAGESIZE];
  char keybuff[10];

  for (i = 0; i < PAGESIZE; i++)
    pdata[i] = 0xff;

  fd = open(MAPFILE, O_CREAT | O_WRONLY);
  for (i = 0; i < PAGES; i++)
    write(fd, pdata, PAGESIZE);
  close(fd);


  fd = open(MAPFILE, O_RDONLY);

  buffer = mmap(NULL, PAGESIZE * PAGES, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0);
  close(fd);

  for (i = 0; i < PAGESIZE; i++)
    pdata[i] = 0;

  mprotect(buffer, PAGESIZE * PAGES, PROT_READ | PROT_WRITE);
  for (i = 0; i < PAGES; i += PAGE_SKIP)
    memcpy(buffer + i * PAGESIZE, pdata, PAGESIZE);
  mprotect(buffer, PAGESIZE * PAGES, PROT_READ | PROT_EXEC);

  printf("Setup complete.  Please generate page pressure on the machine.\n");
  printf("Once that is done, press ENTER\n");

  fgets(keybuff, 9, stdin);

  for (i = 0; i < PAGES; i += PAGE_SKIP)
    if (memcmp(buffer + i * PAGESIZE, pdata, PAGESIZE) != 0)
      printf("Mapping reverted on page starting at 0x%p\n", buffer + i * PAGESIZE);


  return 0;
}

/*
--
Christopher Neufeld, Senior Linux Consultant, Linuxcare, Inc.
613.562.9854 tel, 613.562.9304 fax
[EMAIL PROTECTED], http://www.linuxcare.com/

----- End forwarded message -----

--
Jason McMullan, Senior Linux Consultant
Linuxcare, Inc. 412.432.6457 tel, 412.656.3519 cell
[EMAIL PROTECTED], http://www.linuxcare.com/
Linuxcare. Putting open source to work.  */

Reply via email to