Hi, I am in the process of evaluating lustre, and I have a small problem I am hoping that someone could shed some light on.
What I'm running: Lustre v1.6beta5 linux 2.6.12.6 (smp) 2 OSS 1 MGS/MDT 4 clients Ethernet network I'm trying to determine whether multiple processes on multiple nodes can simultaneously mmap a common file on a lustre file system, write to it, and produce a coherent result (I'm using OpenMPI to spawn the processes and provide synchronization barriers). In my tests, each process is writing a 10,000 byte segment of the file, but is memory mapping the whole file. What I'm seeing is that if I use 40 processes or less, the file is (usually) produced correctly. However, when I try my test with 50 or 100 processes, I rarely get a good result; in fact, the tests seem to hang. What I've found is that, when the test fails, there are processes remaining on the lustre client nodes that are using up all the CPU, but never seem to finish. I have no trouble interrupting the running processes in this case. While I'm not entirely sure of the result I should expect in these tests, I certainly would expect the test to finish. Does anyone have any comments or ideas? -- Martin _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
