Hello Ceph-Users,
I was testing our rados gateway and after a few hours rgw started sending http
500 responses for certain uploads. I did some digging and found that a HDD
died. The OSD was marked out, but not after a short rgw outage. Start to finish
was 60 to 120 seconds.
I have a few questions;
1) Fastcgi timed out after 30 seconds. If I raise the timeout to 120 seconds,
will that protect me from future HDD failures?
Example of the error.log from apache:
[error] [client 10.194.255.14] FastCGI: incomplete headers (0 bytes)
received from server "/var/www/s3gw.fcgi"
[error] [client 10.194.255.1] FastCGI: comm with server
"/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
2) Why did it take so long for Ceph to recover?
3) Anything I can to improve HDD failure resiliency?
Thank you.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com