Hi all
We mounted cephfs with ceph-fuse on two machines. We found that if a process
writing a log on node A, while "tail -f" it on node B will quite slow, The mds
server also complain like:
> 2018-07-09 15:10:35.516602 7f32fa0c2700 0 log_channel(cluster) log [WRN] : 2
> slow requests, 1 included below; oldest blocked for > 8.421551 secs
> 2018-07-09 15:10:35.516608 7f32fa0c2700 0 log_channel(cluster) log [WRN] :
> slow request 5.866578 seconds old, received at 2018-07-09 15:10:29.649997:
> client_request(client.3777088:24818 getattr pAsLsXsFs #1000008a41e 2018-07-09
> 15:10:03.842337) currently failed to rdlock, waiting
> 2018-07-09 15:10:48.517367 7f32fa0c2700 0 log_channel(cluster) log [WRN] : 2
> slow requests, 2 included below; oldest blocked for > 5.860196 secs
> 2018-07-09 15:10:48.517373 7f32fa0c2700 0 log_channel(cluster) log [WRN] :
> slow request 5.860196 seconds old, received at 2018-07-09 15:10:42.657139:
> client_request(client.3777088:24826 getattr pAsLsXsFs #1000008a41e 2018-07-09
> 15:10:16.843077) currently failed to rdlock, waiting
> 2018-07-09 15:10:48.517375 7f32fa0c2700 0 log_channel(cluster) log [WRN] :
> slow request 5.622276 seconds old, received at 2018-07-09 15:10:42.895059:
> client_request(client.3775872:34689 lookup #10000011127/choury-test
> 2018-07-09 15:10:42.894941) currently failed to rdlock, waiting
> 2018-07-09 15:10:51.517530 7f32fa0c2700 0 log_channel(cluster) log [WRN] : 2
> slow requests, 1 included below; oldest blocked for > 8.622448 secs
> 2018-07-09 15:10:51.517536 7f32fa0c2700 0 log_channel(cluster) log [WRN] :
> slow request 5.399846 seconds old, received at 2018-07-09 15:10:46.117661:
> client_request(client.3775872:34690 lookup #10000011127/choury-test
> 2018-07-09 15:10:46.117586) currently failed to rdlock, waiting
> 2018-07-09 15:10:53.517640 7f32fa0c2700 0 log_channel(cluster) log [WRN] : 2
> slow requests, 1 included below; oldest blocked for > 10.622560 secs
> 2018-07-09 15:10:53.517646 7f32fa0c2700 0 log_channel(cluster) log [WRN] :
> slow request 10.622560 seconds old, received at 2018-07-09 15:10:42.895059:
> client_request(client.3775872:34689 lookup #10000011127/choury-test
> 2018-07-09 15:10:42.894941) currently failed to rdlock, waiting
> 2018-07-09 15:10:56.517819 7f32fa0c2700 0 log_channel(cluster) log [WRN] : 1
> slow requests, 1 included below; oldest blocked for > 10.400132 secs
> 2018-07-09 15:10:56.517826 7f32fa0c2700 0 log_channel(cluster) log [WRN] :
> slow request 10.400132 seconds old, received at 2018-07-09 15:10:46.117661:
> client_request(client.3775872:34690 lookup #10000011127/choury-test
> 2018-07-09 15:10:46.117586) currently failed to rdlock, waiting
We reproduced this problem in the test cluster. there's only two processed(on
two machines) access the cluster, the writer, and tail.
The test writer code:
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> const char *s =
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
> int main(int argc, char** argv){
> FILE *f=fopen(argv[1],"ab+");
> if(f==NULL){
> printf("cant'to open destination file\n");
> return 0;
> }
> int line = 0;
> while(true){
> char buff[1024]={0};
> for(int i = 0; i< 200; i++){
> buff[i] = s[rand()%62];
> }
> fprintf(f, "%d: %s\n", line++, buff);
> fflush(f);
> sleep(30);
> }
> fclose(f);
> return 0;
> }
The version of ceph is 10.2.10.
How can I reduce the latency and slow requests?
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com