Can you advise on what the issues may be?
Yehuda Sadeh <[email protected]> wrote: >On Wed, Jan 22, 2014 at 8:55 AM, Graeme Lambert <[email protected]> >wrote: >> Hi Yehuda, >> >> With regards to the health status of the cluster, it isn't healthy >but I >> haven't found any way of fixing the placement group errors. Looking >at the >> ceph health detail it's also showing blocked requests too? >> >> HEALTH_WARN 1 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs >stuck >> unclean; 7 requests are blocked > 32 sec; 3 osds have slow requests; >pool >> cloudstack has too few pgs; pool .rgw.buckets has too few pgs >> pg 14.0 is stuck inactive since forever, current state incomplete, >last >> acting [5,0] >> pg 14.2 is stuck inactive since forever, current state incomplete, >last >> acting [0,5] >> pg 14.6 is stuck inactive since forever, current state >down+incomplete, last >> acting [4,2] >> pg 14.0 is stuck unclean since forever, current state incomplete, >last >> acting [5,0] >> pg 14.2 is stuck unclean since forever, current state incomplete, >last >> acting [0,5] >> pg 14.6 is stuck unclean since forever, current state >down+incomplete, last >> acting [4,2] >> pg 14.0 is incomplete, acting [5,0] >> pg 14.2 is incomplete, acting [0,5] >> pg 14.6 is down+incomplete, acting [4,2] > > >You should figure these first before trying to get the gateway >working. May very well be your culprit. > >Yehuda > >> 3 ops are blocked > 8388.61 sec >> 3 ops are blocked > 4194.3 sec >> 1 ops are blocked > 2097.15 sec >> 1 ops are blocked > 8388.61 sec on osd.0 >> 1 ops are blocked > 4194.3 sec on osd.0 >> 2 ops are blocked > 8388.61 sec on osd.4 >> 2 ops are blocked > 4194.3 sec on osd.5 >> 1 ops are blocked > 2097.15 sec on osd.5 >> 3 osds have slow requests >> pool cloudstack objects per pg (37316) is more than 27.1587 times >cluster >> average (1374) >> pool .rgw.buckets objects per pg (76219) is more than 55.4723 times >cluster >> average (1374) >> >> >> Ignore the cloudstack pool, I was using cloudstack but not anymore, >it's an >> inactive pool. >> >> Best regards >> >> Graeme >> >> >> >> On 22/01/14 16:38, Graeme Lambert wrote: >> >> Hi, >> >> Following discussions with people in the IRC I set debug_ms and this >is what >> is being looped over and over when one of them is stuck: >> http://pastebin.com/KVcpAeYT >> >> Regarding the modules, apache version is 2.2.22-2precise.ceph and the >> fastcgi mod version is 2.4.7~0910052141-2~bpo70+1.ceph. >> >> Best regards >> >> Graeme >> >> >> On 22/01/14 16:28, Yehuda Sadeh wrote: >> >> On Wed, Jan 22, 2014 at 8:05 AM, Graeme Lambert ><[email protected]> >> wrote: >> >> Hi, >> >> I'm using the aws-sdk-for-php classes for the Ceph RADOS gateway but >I'm >> getting an intermittent issue with the uploading files. >> >> I'm attempting to upload an array of objects to Ceph one by one using >the >> create_object() function. It appears to stop randomly when >attempting to do >> them all, it could stop at the first one, in between or the last one, >there >> is no pattern to it that I can see. >> >> I'm not getting any PHP errors that indicate an issue and equally >there are >> no exceptions being caught. >> >> In the radosgw log file, at the time it appears stuck I get: >> >> 2014-01-22 15:39:21.656763 7fac44fe1700 1 ====== starting new >request >> req=0x2417c30 ===== >> >> And then sometimes I see: >> >> 2014-01-22 15:40:42.490485 7fac99ff9700 1 heartbeat_map is_healthy >> 'RGWProcess::m_tp thread 0x7fac51ffb700' had timed out after 600 >> >> repeated over and over again. >> >> When those messages are appearing, Apache's error log shows: >> >> [Wed Jan 22 15:43:11 2014] [error] [client 172.16.2.149] FastCGI: >comm with >> server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec), referer: >> https://[server]/[path] >> >> equally over and over again. >> >> I have restarted apache, radosgw, all Ceph OSDs and ceph-mon >processes and >> still no joy with this. >> >> Can anyone advise on where I'm going wrong with this? >> >> Which fastcgi module are you using? Can you provide a log with 'debug >> ms = 1' for a failing request? Usually that kind of message means >that >> it's waiting for the osd to response, which might point at an >> unhealthy cluster. >> >> Yehuda >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
