Stefan, I'm looking at your logs and coredump now.

On 06/21/2012 11:43 PM, Stefan Priebe wrote:
Does anybody have an idea? This is right now a showstopper to me.

Am 21.06.2012 um 14:55 schrieb Stefan Priebe - Profihost 
AG<s.pri...@profihost.ag>:

Hello list,

i'm able to reproducably crash osd daemons.

How i can reproduce:

Kernel: 3.5.0-rc3
Ceph: 0.47.3
FS: btrfs
Journal: 2GB tmpfs per OSD
OSD: 3x servers with 4x Intel SSD OSDs each
10GBE Network
rbd_cache_max_age: 2.0
rbd_cache_size: 33554432

Disk is set to writeback.

Start a KVM VM via PXE with the disk attached in writeback mode.

Then run randwrite stress more than 2 time. Mostly OSD 22 in my case crashes.

# fio --filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G 
--numjobs=50 --runtime=90 --group_reporting --name=file1; fio 
--filename=/dev/vda1 --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 
--runtime=90 --group_reporting --name=file1; fio --filename=/dev/vda1 
--direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=90 
--group_reporting --name=file1; halt

Strangely exactly THIS OSD also has the most log entries:
64K     ceph-osd.20.log
64K     ceph-osd.21.log
1,3M    ceph-osd.22.log
64K     ceph-osd.23.log

But all OSDs are set to debug osd = 20.

dmesg shows:
ceph-osd[5381]: segfault at 3f592c000 ip 00007fa281d8eb23 sp 00007fa27702d260 
error 4 in libtcmalloc.so.0.0.0[7fa281d6a000+3d000]

I uploaded the following files:
priebe_fio_randwrite_ceph-osd.21.log.bz2 =>  OSD which was OK and didn't crash
priebe_fio_randwrite_ceph-osd.22.log.bz2 =>  Log from the crashed OSD
üu
priebe_fio_randwrite_core.ssdstor001.27204.bz2 =>  Core dump
priebe_fio_randwrite_ceph-osd.bz2 =>  osd binary

Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to