Re: [ceph-users] ceph pg query hangs for ever

Goncalo Borges Thu, 31 Mar 2016 16:29:06 -0700

Hi Mart, Wido...

A disclaimer: Not really an expert, just a regular site admin sharing myexperience.

At the beginning of the thread you give the idea that only osd.68 hasproblems dealing with the problematic PG 3.117. If that is indeed thecase, you could simply mark that osd.68 down and remove it from thecluster. This will trigger Ceph to replicate all PGs in osd.68 to otherosds based on other PG replicas.

However, In the last email, you seem to give the idea that it is the PG3.117 which has problems, which makes all osds which share that PG alsoproblematic. Because of that you marked all osds sharing that PG as down.

Before actually trying something more drastic, I would go for a moreclassic approach. For example, what happens if you turn only one osd up?I would start with osd.74 since you suspect of problems in osd.68 andosd.55 was the reason for the dump message bellow. If it still abortsthan it means that the PG might have been replicated everywhere with'bad' data.

The drastic approach (If you do not care about data on that PG), is tomark those osds has down, and force that PG to be recreated using 'cephpg force_create_pg 3.117'. Based on my previous experience, once I'verecreated a PG, 'ceph pg dump_stuck stale' showed that PG is creatingstate forever. To make it right, I had to restart the proper osds. But,as you stated, you then have to deal with data corruption at the VMslevel... Maybe that is a problem, maybe it isn't...


Hope that helps
Cheers
Goncalo




On 03/31/2016 12:26 PM, Mart van Santen wrote:

Hello,
Well unfortunately the problem is not really solved. Yes, we managedto get to a good health state at some point, when a client hits somespecific data, the osd process crashes with below errors. The 3 OSDwhich handle 3.117, the PG with problems, are currently down andreweighted them to 0, so non-affected PGs are currently rebuild onother OSDs
If I put them crashed osd up, the do crash again within a few minutes.
As I'm a bit afraid for the data in this PG, I think we want torecreate the PG with empty data and discard the old disks. Iunderstand I will get datacorruption on serveral RBDs in this case,but we will try to solve that and rebuild the affected VMs. Does thismakes sense and what are the best next steps?
Regards,

Mart

works
-34> 2016-03-31 03:07:56.932800 7f8e43829700 3 osd.55 122203handle_osd_map epochs [122203,122203], i have 122203, src has[120245,122203]-33> 2016-03-31 03:07:56.932837 7f8e43829700 1 --[2a00:c6c0:0:122::105]:6822/11703 <== osd.45[2a00:c6c0:0:122::103]:6800/1852 7 ==== pg_info(1 pgs e122202:3.117)v4 ==== 919+0+0 (3389909573 0 0) 0x528bc00 con 0x1200a840-32> 2016-03-31 03:07:56.932855 7f8e43829700 5 -- op tracker --seq: 22, time: 2016-03-31 03:07:56.932770, event: header_read, op:pg_info(1 pgs e122202:3.117)-31> 2016-03-31 03:07:56.932869 7f8e43829700 5 -- op tracker --seq: 22, time: 2016-03-31 03:07:56.932771, event: throttled, op:pg_info(1 pgs e122202:3.117)-30> 2016-03-31 03:07:56.932878 7f8e43829700 5 -- op tracker --seq: 22, time: 2016-03-31 03:07:56.932822, event: all_read, op:pg_info(1 pgs e122202:3.117)-29> 2016-03-31 03:07:56.932886 7f8e43829700 5 -- op tracker --seq: 22, time: 2016-03-31 03:07:56.932851, event: dispatched, op:pg_info(1 pgs e122202:3.117)-28> 2016-03-31 03:07:56.932895 7f8e43829700 5 -- op tracker --seq: 22, time: 2016-03-31 03:07:56.932895, event: waiting_for_osdmap,op: pg_info(1 pgs e122202:3.117)-27> 2016-03-31 03:07:56.932912 7f8e43829700 5 -- op tracker --seq: 22, time: 2016-03-31 03:07:56.932912, event: started, op:pg_info(1 pgs e122202:3.117)-26> 2016-03-31 03:07:56.932947 7f8e43829700 5 -- op tracker --seq: 22, time: 2016-03-31 03:07:56.932947, event: done, op: pg_info(1pgs e122202:3.117)-25> 2016-03-31 03:07:56.933022 7f8e3c01a700 1 --[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::103]:6800/1852-- osd_map(122203..122203 src has 121489..122203) v3 -- ?+0 0x11c7fd40con 0x1200a840-24> 2016-03-31 03:07:56.933041 7f8e3c01a700 1 --[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::103]:6800/1852-- pg_info(1 pgs e122203:3.117) v4 -- ?+0 0x528bde0 con 0x1200a840-23> 2016-03-31 03:07:56.933111 7f8e3c01a700 1 --[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::105]:6810/3568-- osd_map(122203..122203 src has 121489..122203) v3 -- ?+0 0x12200d00con 0x1209d4a0-22> 2016-03-31 03:07:56.933125 7f8e3c01a700 1 --[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::105]:6810/3568-- pg_info(1 pgs e122203:3.117) v4 -- ?+0 0x5288960 con 0x1209d4a0-21> 2016-03-31 03:07:56.933154 7f8e3c01a700 1 --[2a00:c6c0:0:122::105]:6822/11703 -->[2a00:c6c0:0:122::108]:6816/1002847 -- pg_info(1 pgs e122203:3.117) v4-- ?+0 0x5288d20 con 0x101a19c0-20> 2016-03-31 03:07:56.933212 7f8e3c01a700 5 osd.55 pg_epoch:122203 pg[3.117( v 122193'1898519 (108032'1895437,122193'1898519]local-les=122202 n=2789 ec=23736 les/c 122202/122047122062/122201/122201) [72,54,45]/[55] r=0 lpr=122201pi=122046-122200/51 bft=45,54,72 crt=122133'1898514 lcod 0'0 mlcod 0'0active+undersized+degraded+remapped] on activate: bft=45,54,72 from0//0//-1-19> 2016-03-31 03:07:56.933232 7f8e3c01a700 5 osd.55 pg_epoch:122203 pg[3.117( v 122193'1898519 (108032'1895437,122193'1898519]local-les=122202 n=2789 ec=23736 les/c 122202/122047122062/122201/122201) [72,54,45]/[55] r=0 lpr=122201pi=122046-122200/51 bft=45,54,72 crt=122133'1898514 lcod 0'0 mlcod 0'0active+undersized+degraded+remapped] target shard 45 from 0//0//-1-18> 2016-03-31 03:07:56.933244 7f8e3c01a700 5 osd.55 pg_epoch:122203 pg[3.117( v 122193'1898519 (108032'1895437,122193'1898519]local-les=122202 n=2789 ec=23736 les/c 122202/122047122062/122201/122201) [72,54,45]/[55] r=0 lpr=122201pi=122046-122200/51 bft=45,54,72 crt=122133'1898514 lcod 0'0 mlcod 0'0active+undersized+degraded+remapped] target shard 54 from 0//0//-1-17> 2016-03-31 03:07:56.933255 7f8e3c01a700 5 osd.55 pg_epoch:122203 pg[3.117( v 122193'1898519 (108032'1895437,122193'1898519]local-les=122202 n=2789 ec=23736 les/c 122202/122047122062/122201/122201) [72,54,45]/[55] r=0 lpr=122201pi=122046-122200/51 bft=45,54,72 crt=122133'1898514 lcod 0'0 mlcod 0'0active+undersized+degraded+remapped] target shard 72 from 0//0//-1-16> 2016-03-31 03:07:56.933283 7f8e3680f700 5 -- op tracker --seq: 20, time: 2016-03-31 03:07:56.933283, event: reached_pg, op:osd_op(client.776466.1:190178605rbd_data.900a62ae8944a.0000000000000829 [set-alloc-hint object_size4194304 write_size 4194304,write 8192~8192] 3.b1492517 RETRY=1 snapc8b3=[8b3] ondisk+retry+write e122203)-15> 2016-03-31 03:07:56.933507 7f8e3680f700 5 -- op tracker --seq: 20, time: 2016-03-31 03:07:56.933507, event: started, op:osd_op(client.776466.1:190178605rbd_data.900a62ae8944a.0000000000000829 [set-alloc-hint object_size4194304 write_size 4194304,write 8192~8192] 3.b1492517 RETRY=1 snapc8b3=[8b3] ondisk+retry+write e122203)-14> 2016-03-31 03:07:56.933648 7f8e3680f700 5 -- op tracker --seq: 20, time: 2016-03-31 03:07:56.933648, event: waiting for subopsfrom 45,54,72, op: osd_op(client.776466.1:190178605rbd_data.900a62ae8944a.0000000000000829 [set-alloc-hint object_size4194304 write_size 4194304,write 8192~8192] 3.b1492517 RETRY=1 snapc8b3=[8b3] ondisk+retry+write e122203)-13> 2016-03-31 03:07:56.933682 7f8e3680f700 1 --[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::103]:6800/1852-- osd_repop(client.776466.1:190178605 3.117b1492517/rbd_data.900a62ae8944a.0000000000000829/head//3 v122203'1898521) v1 -- ?+46 0x11e96400 con 0x1200a840-12> 2016-03-31 03:07:56.933712 7f8e3680f700 1 --[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::105]:6810/3568-- osd_repop(client.776466.1:190178605 3.117b1492517/rbd_data.900a62ae8944a.0000000000000829/head//3 v122203'1898521) v1 -- ?+46 0x11e96a00 con 0x1209d4a0-11> 2016-03-31 03:07:56.933735 7f8e3680f700 1 --[2a00:c6c0:0:122::105]:6822/11703 -->[2a00:c6c0:0:122::108]:6816/1002847 --osd_repop(client.776466.1:190178605 3.117b1492517/rbd_data.900a62ae8944a.0000000000000829/head//3 v122203'1898521) v1 -- ?+46 0x11e97600 con 0x101a19c0-10> 2016-03-31 03:07:56.935173 7f8e30ef5700 1 --[2a00:c6c0:0:122::105]:6822/11703 <== osd.72[2a00:c6c0:0:122::108]:6816/1002847 9 ====osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0) v1==== 83+0+0 (405786713 0 0) 0x11e66d00 con 0x101a19c0-9> 2016-03-31 03:07:56.935212 7f8e30ef5700 5 -- op tracker --seq: 23, time: 2016-03-31 03:07:56.935087, event: header_read, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-8> 2016-03-31 03:07:56.935224 7f8e30ef5700 5 -- op tracker --seq: 23, time: 2016-03-31 03:07:56.935090, event: throttled, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-7> 2016-03-31 03:07:56.935234 7f8e30ef5700 5 -- op tracker --seq: 23, time: 2016-03-31 03:07:56.935162, event: all_read, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-6> 2016-03-31 03:07:56.935245 7f8e30ef5700 5 -- op tracker --seq: 23, time: 0.000000, event: dispatched, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-5> 2016-03-31 03:07:56.936129 7f8e2dfc6700 1 --[2a00:c6c0:0:122::105]:6822/11703 <== osd.45[2a00:c6c0:0:122::103]:6800/1852 8 ====osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0) v1==== 83+0+0 (3967999676 0 0) 0x11c7fd40 con 0x1200a840-4> 2016-03-31 03:07:56.936150 7f8e2dfc6700 5 -- op tracker --seq: 24, time: 2016-03-31 03:07:56.936086, event: header_read, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-3> 2016-03-31 03:07:56.936159 7f8e2dfc6700 5 -- op tracker --seq: 24, time: 2016-03-31 03:07:56.936087, event: throttled, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-2> 2016-03-31 03:07:56.936166 7f8e2dfc6700 5 -- op tracker --seq: 24, time: 2016-03-31 03:07:56.936124, event: all_read, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-1> 2016-03-31 03:07:56.936172 7f8e2dfc6700 5 -- op tracker --seq: 24, time: 0.000000, event: dispatched, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)0> 2016-03-31 03:07:56.940165 7f8e3680f700 -1 osd/SnapMapper.cc:In function 'void SnapMapper::add_oid(const hobject_t&, conststd::set<snapid_t>&, MapCacher::Transaction<std::basic_string<char>,ceph::buffer::list>*)' thread 7f8e3680f700 time 2016-03-31 03:07:56.933983
osd/SnapMapper.cc: 228: FAILED assert(r == -2)

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, charconst*)+0x8b) [0xba8b8b]2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,std::less<snapid_t>, std::allocator<snapid_t> > const&,MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x61e)[0x72137e]3: (PG::update_snap_map(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,ObjectStore::Transaction&)+0x402) [0x7d25c2]4: (PG::append_log(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,ObjectStore::Transaction&, bool)+0x4e8) [0x7d2c68]5: (ReplicatedPG::log_operation(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,boost::optional<pg_hit_set_history_t>&, eversion_t const&, eversion_tconst&, bool, ObjectStore::Transaction*)+0xba) [0x899eca]6: (ReplicatedBackend::submit_transaction(hobject_t const&,eversion_t const&, PGBackend::PGTransaction*, eversion_t const&,eversion_t const&, std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*,unsigned long, osd_reqid_t, std::tr1::shared_ptr<OpRequest>)+0x77c)[0x9f06cc]
 7: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*)+0x7aa) [0x8391aa]
 8: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xbdd) [0x88792d]
9: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x4559)[0x88cee9]10: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,ThreadPool::TPHandle&)+0x66a) [0x82702a]11: (OSD::dequeue_op(boost::intrusive_ptr<PG>,std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x6961dd]12: (OSD::ShardedOpWQ::_process(unsigned int,ceph::heartbeat_handle_d*)+0x338) [0x696708]13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x875)[0xb98555]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb9a670]
 15: (()+0x8182) [0x7f8e57f6c182]
 16: (clone()+0x6d) [0x7f8e564d747d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` isneeded to interpret this.
--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.55.log
--- end dump of recent events ---
2016-03-31 03:07:56.960104 7f8e3680f700 -1 *** Caught signal (Aborted) **
 in thread 7f8e3680f700

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 1: /usr/bin/ceph-osd() [0xaaff6a]
 2: (()+0x10340) [0x7f8e57f74340]
 3: (gsignal()+0x39) [0x7f8e56413cc9]
 4: (abort()+0x148) [0x7f8e564170d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f8e56d1e535]
 6: (()+0x5e6d6) [0x7f8e56d1c6d6]
 7: (()+0x5e703) [0x7f8e56d1c703]
 8: (()+0x5e922) [0x7f8e56d1c922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, charconst*)+0x278) [0xba8d78]10: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,std::less<snapid_t>, std::allocator<snapid_t> > const&,MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x61e)[0x72137e]11: (PG::update_snap_map(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,ObjectStore::Transaction&)+0x402) [0x7d25c2]12: (PG::append_log(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,ObjectStore::Transaction&, bool)+0x4e8) [0x7d2c68]13: (ReplicatedPG::log_operation(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,boost::optional<pg_hit_set_history_t>&, eversion_t const&, eversion_tconst&, bool, ObjectStore::Transaction*)+0xba) [0x899eca]14: (ReplicatedBackend::submit_transaction(hobject_t const&,eversion_t const&, PGBackend::PGTransaction*, eversion_t const&,eversion_t const&, std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*,unsigned long, osd_reqid_t, std::tr1::shared_ptr<OpRequest>)+0x77c)[0x9f06cc]15: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*)+0x7aa)[0x8391aa]16: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xbdd)[0x88792d]17: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x4559)[0x88cee9]18: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,ThreadPool::TPHandle&)+0x66a) [0x82702a]19: (OSD::dequeue_op(boost::intrusive_ptr<PG>,std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x6961dd]20: (OSD::ShardedOpWQ::_process(unsigned int,ceph::heartbeat_handle_d*)+0x338) [0x696708]21: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x875)[0xb98555]
 22: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb9a670]
 23: (()+0x8182) [0x7f8e57f6c182]
 24: (clone()+0x6d) [0x7f8e564d747d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` isneeded to interpret this.
--- begin dump of recent events ---
-7> 2016-03-31 03:07:56.945486 7f8e43829700 1 --[2a00:c6c0:0:122::105]:6822/11703 <== osd.54[2a00:c6c0:0:122::105]:6810/3568 7 ==== osd_map(122203..122203 src has121489..122203) v3 ==== 222+0+0 (2966331141 0 0) 0x12200d00 con 0x1209d4a0-6> 2016-03-31 03:07:56.945514 7f8e43829700 3 osd.55 122203handle_osd_map epochs [122203,122203], i have 122203, src has[121489,122203]-5> 2016-03-31 03:07:56.945517 7f8e2d6bd700 1 --[2a00:c6c0:0:122::105]:6822/11703 <== osd.54[2a00:c6c0:0:122::105]:6810/3568 8 ====osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0) v1==== 83+0+0 (4008969226 0 0) 0x11e661c0 con 0x1209d4a0-4> 2016-03-31 03:07:56.945538 7f8e2d6bd700 5 -- op tracker --seq: 25, time: 2016-03-31 03:07:56.945488, event: header_read, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-3> 2016-03-31 03:07:56.945545 7f8e2d6bd700 5 -- op tracker --seq: 25, time: 2016-03-31 03:07:56.945489, event: throttled, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-2> 2016-03-31 03:07:56.945549 7f8e2d6bd700 5 -- op tracker --seq: 25, time: 2016-03-31 03:07:56.945512, event: all_read, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)-1> 2016-03-31 03:07:56.945552 7f8e2d6bd700 5 -- op tracker --seq: 25, time: 0.000000, event: dispatched, op:osd_repop_reply(client.776466.1:190178605 3.117 ondisk, result = 0)0> 2016-03-31 03:07:56.960104 7f8e3680f700 -1 *** Caught signal(Aborted) **
 in thread 7f8e3680f700

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 1: /usr/bin/ceph-osd() [0xaaff6a]
 2: (()+0x10340) [0x7f8e57f74340]
 3: (gsignal()+0x39) [0x7f8e56413cc9]
 4: (abort()+0x148) [0x7f8e564170d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f8e56d1e535]
 6: (()+0x5e6d6) [0x7f8e56d1c6d6]
 7: (()+0x5e703) [0x7f8e56d1c703]
 8: (()+0x5e922) [0x7f8e56d1c922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, charconst*)+0x278) [0xba8d78]10: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,std::less<snapid_t>, std::allocator<snapid_t> > const&,MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x61e)[0x72137e]11: (PG::update_snap_map(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,ObjectStore::Transaction&)+0x402) [0x7d25c2]12: (PG::append_log(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,ObjectStore::Transaction&, bool)+0x4e8) [0x7d2c68]13: (ReplicatedPG::log_operation(std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,boost::optional<pg_hit_set_history_t>&, eversion_t const&, eversion_tconst&, bool, ObjectStore::Transaction*)+0xba) [0x899eca]14: (ReplicatedBackend::submit_transaction(hobject_t const&,eversion_t const&, PGBackend::PGTransaction*, eversion_t const&,eversion_t const&, std::vector<pg_log_entry_t,std::allocator<pg_log_entry_t> > const&,boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*,unsigned long, osd_reqid_t, std::tr1::shared_ptr<OpRequest>)+0x77c)[0x9f06cc]15: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*)+0x7aa)[0x8391aa]16: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xbdd)[0x88792d]17: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x4559)[0x88cee9]18: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,ThreadPool::TPHandle&)+0x66a) [0x82702a]19: (OSD::dequeue_op(boost::intrusive_ptr<PG>,std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x6961dd]20: (OSD::ShardedOpWQ::_process(unsigned int,ceph::heartbeat_handle_d*)+0x338) [0x696708]21: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x875)[0xb98555]
 22: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb9a670]
 23: (()+0x8182) [0x7f8e57f6c182]
 24: (clone()+0x6d) [0x7f8e564d747d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` isneeded to interpret this.
--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.55.log
--- end dump of recent events ---

On 03/30/2016 11:36 PM, Mart van Santen wrote:
Hi there,

With the help of a lot of people we were able to repair the PG and
restored service. We will get back on this later with a full report for
future reference.

Regards,

Mart


On 03/30/2016 08:30 PM, Wido den Hollander wrote:
Hi,

I have an issue with a Ceph cluster which I can't resolve.

Due to OSD failure a PG is incomplete, but I can't query the PG to see what I
can do to fix it.

      health HEALTH_WARN
             1 pgs incomplete
             1 pgs stuck inactive
             1 pgs stuck unclean
             98 requests are blocked > 32 sec

$ ceph pg 3.117 query

That will hang for ever.

$ ceph pg dump_stuck

pg_stat state   up      up_primary      acting  acting_primary
3.117   incomplete      [68,55,74]      68      [68,55,74]      68

The primary PG in this case is osd.68 . If I stop the OSD the PG query works,
but it says that bringing osd 68 back online will probably help.

The 98 requests which are blocked are also on osd.68 and they all say:

They all say:
- initiated
- reached_pg

The cluster is running Hammer 0.94.5 in this case.

 From what I know a OSD had a failing disk and was restarted a couple of times
while the disk gave errors. This caused the PG to become incomplete.

I've set debug osd to 20, but I can't really tell what is going wrong on osd.68
which causes it to stall this long.

Any idea what to do here to get this PG up and running again?

Wido
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph pg query hangs for ever

Reply via email to