Hello Ceph-Users,

I was testing our rados gateway and after a few hours rgw started sending http 
500 responses for certain uploads. I did some digging and found that a HDD 
died. The OSD was marked out, but not after a short rgw outage. Start to finish 
was 60 to 120 seconds.

I have a few questions;

1) Fastcgi timed out after 30 seconds. If I raise the timeout to 120 seconds, 
will that protect me from future HDD failures? 
        Example of the error.log from apache:

        [error] [client 10.194.255.14] FastCGI: incomplete headers (0 bytes) 
received from server "/var/www/s3gw.fcgi"
        [error] [client 10.194.255.1] FastCGI: comm with server 
"/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)

2) Why did it take so long for Ceph to recover? 

3) Anything I can to improve HDD failure resiliency?

Thank you. 
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to