We are having some difficulties with cephfs access to the same file from
multiple nodes concurrently. After debugging some large-ish
applications with noticeable performance problems using CephFS (with the
fuse client), I have a small test program to reproduce the problem.
The core of the problem boils down to the following operation being run
on the same file on multiple nodes (in a loop in the test program):
int fd = open(filename, mode);
read(fd, buffer, 100);
close(fd);
Here are some results on our cluster:
* One node, mode=read-only: 7000 opens/second
* One node, mode=read-write: 7000 opens/second
* Two nodes, mode=read-only: 7000 opens/second/node
* Two nodes, mode=read-write: around *0.5 opens/second/node* (!!!)
* Two nodes, one read-only, one read-write: around *0.5
opens/second/node* (!!!)
* Two nodes, mode=read-write, but remove the 'read(fd, buffer,100)'
line from the code: 500 opens/second/node
So there seems to be some problems with opening the same file read/write
and reading from the file on multiple nodes. That operation seems to be
3 orders of magnitude slower than other parallel access patterns to the
same file. The 1 second time to open files almost seems like some
timeout is happening somewhere. I have some suspicion that this has to
do with capability management between the fuse client and the MDS, but I
don't know enough about that protocol to make an educated assessment.
[And an aside - how does this become a problem? I.e. why open a file
read/write and read from it? Well, it turns out gfortran compiled code
does this by default if the user doesn't explicitly says otherwise].
All the nodes in this test are very lightly loaded, so there does not
seems to be any noticeable performance bottleneck (network, CPU, etc.).
The code to reproduce the problem is attached. Simply compile it,
create a test file with a few bytes of data in it, and run the test code
on two separate nodes on the same file.
We are running ceph 10.2.9 both on the server, and we use the 10.2.9
fuse client on the client nodes.
Any input/help would be greatly appreciated.
Andras
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#define INTERVAL 2
double now()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec + tv.tv_usec / 1e6;
}
int main(int argc, char *argv[])
{
if (argc != 3) {
fprintf(stderr, "Usage: %s <filename> r|rw\n", argv[0]);
exit(1);
}
const char *filename = argv[1];
int mode = 0;
if (strcmp(argv[2], "r") == 0) {
mode = O_RDONLY;
} else if (strcmp(argv[2], "rw") == 0) {
mode = O_RDWR;
} else {
fprintf(stderr, "Second argument must be 'r' or 'rw'\n");
exit(1);
}
while (1) {
char buffer[100];
double t0 = now();
double dt;
int count = 0;
while (1) {
dt = now() - t0;
if (dt > INTERVAL) {
break;
}
int fd = open(filename, mode);
if (fd < 0) {
printf("Could not open file '%s' for read/write", filename);
exit(1);
}
read(fd, buffer, 100);
close(fd);
count++;
}
printf("File open rate: %8.2f\n", count / dt);
}
return 0;
}
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com