Hi Yann,

That seems related to http://tracker.ceph.com/issues/10536 which seems to be 
resolved. Could you create a new issue with a link to 10536 ? More logs and 
ceph report would also be useful to figure out why it resurfaced.

Thanks !


On 04/03/2015 00:04, Yann Dupont wrote:
> 
> Le 03/03/2015 22:03, Italo Santos a écrit :
>>
>> I realised that when the first OSD goes down, the cluster was performing a 
>> deep-scrub and I found the bellow trace on the logs of osd.8, anyone can 
>> help me understand why the osd.8, and other osds, unexpected goes down?
>>
> 
> I'm afraid I've seen this this afternoon too on my test cluster, just after 
> upgrading from 0.87 to 0.93. After an initial migration success, some OSD 
> started to go down : All presented similar stack traces , with magic word 
> "scrub" in it :
> 
> ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
>  1: /usr/bin/ceph-osd() [0xbeb3dc]
>  2: (()+0xf0a0) [0x7f8f3ca130a0]
>  3: (gsignal()+0x35) [0x7f8f3b37d165]
>  4: (abort()+0x180) [0x7f8f3b3803e0]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f8f3bbd389d]
>  6: (()+0x63996) [0x7f8f3bbd1996]
>  7: (()+0x639c3) [0x7f8f3bbd19c3]
>  8: (()+0x63bee) [0x7f8f3bbd1bee]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x220) [0xcd74f0]
>  10: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)+0x1fc) 
> [0x97259c]
>  11: (ReplicatedPG::simple_repop_submit(ReplicatedPG::RepGather*)+0x7a) 
> [0x97344a]
>  12: (ReplicatedPG::_scrub(ScrubMap&, std::map<hobject_t, std::pair<unsigned 
> int, unsigned int>, std::less<hobject_t>, std::allocator<std::pair<hobject_t 
> const, std::pa
> ir<unsigned int, unsigned int> > > > const&)+0x2e4d) [0x9a5ded]
>  13: (PG::scrub_compare_maps()+0x658) [0x916378]
>  14: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x202) [0x917ee2]
>  15: (PG::scrub(ThreadPool::TPHandle&)+0x3a3) [0x919f83]
>  16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x13) [0x7eff93]
>  17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x629) [0xcc8c49]
>  18: (ThreadPool::WorkThread::entry()+0x10) [0xccac40]
>  19: (()+0x6b50) [0x7f8f3ca0ab50]
>  20: (clone()+0x6d) [0x7f8f3b42695d]
> 
> As a temporary measure, noscrub and nodeep-scrub are now set for this 
> cluster, and all is working fine right now.
> 
> So there is probably something wrong here. Need to investigate further.
> 
> Cheers,
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to