Hi,
This is one we've seen before, issue #326
http://tracker.newdream.net/issues/326
Was that the first (and only?) osd to fail?
What kind of workload were you subjecting the cluster to? Just the file
system? RBD? Anything unusual?
Also, can you confirm what version of the code you were running? The osd
log at /var/log/ceph/osd.*.log should have a version number and sha1 id,
something like
ceph version 0.22~rc (3cd9d853cd58c79dc12427be8488e57970abda04)
Thanks!
sage
On Mon, 6 Sep 2010, Leander Yu wrote:
> Hi all,
> I have setup a 10 osd + 2 mds + 3 mon ceph cluster. it runs ok at
> beginning. However after one day, some of the osd crashed with
> following assert fail
> I am using the unstable trunk. ceph.conf is attached.
>
> -------------- osd 3 -----------------
> osd/PG.h: In function 'void PG::IndexedLog::index(PG::Log::Entry&)':
> osd/PG.h:429: FAILED assert(caller_ops.count(e.reqid) == 0)
> 1: (OSD::_process_pg_info(unsigned int, int, PG::Info&, PG::Log&,
> PG::Missing&, std::map<int, MOSDPGInfo*, std::less<int>,
> std::allocator<std::pair<int const, MOSDPGInfo*> > >*, int&)+0xb06)
> [0x4cf426]
> 2: (OSD::handle_pg_log(MOSDPGLog*)+0xa9) [0x4cf999]
> 3: (OSD::_dispatch(Message*)+0x3ed) [0x4e7dfd]
> 4: (OSD::ms_dispatch(Message*)+0x39) [0x4e86c9]
> 5: (SimpleMessenger::dispatch_entry()+0x789) [0x46b5f9]
> 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45849c]
> 7: (Thread::_entry_func(void*)+0xa) [0x46c0ca]
> 8: (()+0x6a3a) [0x7f69fd39ea3a]
> 9: (clone()+0x6d) [0x7f69fc5bc77d]
>
> -------------- osd 7 --------------------
> osd/ReplicatedPG.cc: In function 'void ReplicatedPG::sub_op_pull(MOSDSubOp*)':
> osd/ReplicatedPG.cc:3021: FAILED assert(r == 0)
> 1: (OSD::dequeue_op(PG*)+0x344) [0x4e6fd4]
> 2: (ThreadPool::worker()+0x28f) [0x5b5a9f]
> 3: (ThreadPool::WorkThread::entry()+0xd) [0x4f0acd]
> 4: (Thread::_entry_func(void*)+0xa) [0x46c0ca]
> 5: (()+0x6a3a) [0x7efff4f12a3a]
> 6: (clone()+0x6d) [0x7efff413077d]
>
> Please let me if you need more information. I still keep the
> environment for collecting more data for debug.
>
> Thanks.
>