Hi, we are currently in the evaluation of ceph. We want to use ceph as filesystem to store kvm virtual machine block images, as well as a s3 object store. Our test setup is build of 4 servers with 1.2TB raid0, composed of 4*300GB SCSI drives. We have 4 osds and 1 mds running actually but we want to have at least 2-3 mds.
We use fedora12 x64 and we had a couple of issues so far. 1. We were not able to start 4 individual osd per server. With this configuration we tried to get best cpu utilization (one osd per CPU-Core, one osd per single disk). We always got Kernel OOPS when we tried to stop osd. ################################################################ Feb 26 15:20:42 sc02 kernel: Oops: 0000 [#3] SMP Feb 26 15:20:42 sc02 kernel: last sysfs file: /sys/module/btrfs/initstate Feb 26 15:20:42 sc02 kernel: CPU 0 Feb 26 15:20:42 sc02 kernel: Modules linked in: btrfs zlib_deflate libcrc32c ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 e1000 tg3 hpwdt iTCO_wdt iTCO_vendor_support shpchp e752x_edac edac_core cciss floppy [last unloaded: scsi_wait_scan] Feb 26 15:20:42 sc02 kernel: Pid: 1427, comm: cosd Tainted: G D 2.6.31.12-174.2.22.fc12.x86_64 #1 ProLiant DL380 G4 Feb 26 15:20:42 sc02 kernel: RIP: 0010:[<ffffffff8119f89f>] [<ffffffff8119f89f>] jbd2_journal_start+0x43/0xe1 Feb 26 15:20:42 sc02 kernel: RSP: 0018:ffff88015400daa8 EFLAGS: 00010286 Feb 26 15:20:42 sc02 kernel: RAX: 0000000000051766 RBX: ffff8801504d2000 RCX: 0000000000000400 Feb 26 15:20:42 sc02 kernel: RDX: 0000000000000401 RSI: 0000000000000001 RDI: ffff8801528cd000 Feb 26 15:20:42 sc02 kernel: RBP: ffff88015400dac8 R08: 0000000000000000 R09: ffff88015400dc30 Feb 26 15:20:42 sc02 kernel: R10: ffffffff81441330 R11: ffff88015400de00 R12: ffff8801528cd000 Feb 26 15:20:42 sc02 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000d19 Feb 26 15:20:42 sc02 kernel: FS: 00007f425824a710(0000) GS:ffff880028028000(0000) knlGS:0000000000000000 Feb 26 15:20:42 sc02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 26 15:20:42 sc02 kernel: CR2: 0000000000051766 CR3: 0000000152c50000 CR4: 00000000000006f0 Feb 26 15:20:42 sc02 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 26 15:20:42 sc02 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 26 15:20:42 sc02 kernel: Process cosd (pid: 1427, threadinfo ffff88015400c000, task ffff880154025e00) Feb 26 15:20:42 sc02 kernel: Stack: Feb 26 15:20:42 sc02 kernel: ffff880152ef5800 ffff880150463e40 ffff88015305a9c0 ffff88015400dc28 Feb 26 15:20:42 sc02 kernel: <0> ffff88015400dad8 ffffffff81180d65 ffff88015400dae8 ffffffff8116ad83 Feb 26 15:20:42 sc02 kernel: <0> ffff88015400db88 ffffffff811710ce ffff88015400db28 ffffffff81201809 Feb 26 15:20:42 sc02 kernel: Call Trace: Feb 26 15:20:42 sc02 kernel: [<ffffffff81180d65>] ext4_journal_start_sb+0x54/0x7e Feb 26 15:20:42 sc02 kernel: [<ffffffff8116ad83>] ext4_journal_start+0x15/0x17 Feb 26 15:20:42 sc02 kernel: [<ffffffff811710ce>] ext4_da_write_begin+0x105/0x20a Feb 26 15:20:42 sc02 kernel: [<ffffffff81201809>] ? __up_read+0x76/0x81 Feb 26 15:20:42 sc02 kernel: [<ffffffff81194fe4>] ? ext4_xattr_get+0x1e6/0x255 Feb 26 15:20:42 sc02 kernel: [<ffffffff810c22c1>] generic_file_buffered_write+0x125/0x303 Feb 26 15:20:42 sc02 kernel: [<ffffffff8110dee3>] ? file_update_time+0xb8/0xed Feb 26 15:20:42 sc02 kernel: [<ffffffff810c28ac>] __generic_file_aio_write_nolock+0x251/0x286 Feb 26 15:20:42 sc02 kernel: [<ffffffff811c800f>] ? selinux_capable+0xe0/0x10c Feb 26 15:20:42 sc02 kernel: [<ffffffff811c422e>] ? inode_has_perm+0x71/0x87 Feb 26 15:20:42 sc02 kernel: [<ffffffff810c2b7e>] generic_file_aio_write+0x6a/0xca Feb 26 15:20:42 sc02 kernel: [<ffffffff8116894f>] ext4_file_write+0x98/0x11d Feb 26 15:20:42 sc02 kernel: [<ffffffff810fc6ca>] do_sync_write+0xe8/0x125 Feb 26 15:20:42 sc02 kernel: [<ffffffff81067b37>] ? autoremove_wake_function+0x0/0x39 Feb 26 15:20:42 sc02 kernel: [<ffffffff811c466c>] ? selinux_file_permission+0x58/0x5d Feb 26 15:20:42 sc02 kernel: [<ffffffff811bcdbd>] ? security_file_permission+0x16/0x18 Feb 26 15:20:42 sc02 kernel: [<ffffffff810fcca6>] vfs_write+0xae/0x10b Feb 26 15:20:42 sc02 kernel: [<ffffffff810fcdc3>] sys_write+0x4a/0x6e Feb 26 15:20:42 sc02 kernel: [<ffffffff81011d32>] system_call_fastpath+0x16/0x1b Feb 26 15:20:42 sc02 kernel: Code: c6 00 00 48 85 ff 49 89 fc 41 89 f5 48 8b 80 90 06 00 00 48 c7 c3 e2 ff ff ff 0f 84 9d 00 00 00 48 85 c0 48 89 c3 74 14 48 8b 00 <48> 39 38 74 04 0f 0b eb fe ff 43 0c e9 81 00 00 00 48 8b 3d a9 Feb 26 15:20:42 sc02 kernel: RIP [<ffffffff8119f89f>] jbd2_journal_start+0x43/0xe1 Feb 26 15:20:42 sc02 kernel: RSP <ffff88015400daa8> Feb 26 15:20:42 sc02 kernel: CR2: 0000000000051766 Feb 26 15:20:42 sc02 kernel: ---[ end trace 58cde2a32eccd54c ]--- ################################################################ 2. With the actual configuration (4 disks per server configured to raid0, one osd per server ) we sometimes have similar crashes when we try to stop ceph. 3. modifying the crushmap does not work at all: [r...@sc01 ~]# ceph osd getcrushmap -o /tmp/crush 10.03.04 14:25:53.944335 mon <- [osd,getcrushmap] 10.03.04 14:25:53.944921 mon0 -> 'got crush map from osdmap epoch 33' (0) 10.03.04 14:25:53.945578 wrote 349 byte payload to /tmp/crush [r...@sc01 ~]# crushtool -d /tmp/crush -o /tmp/crush.txt [r...@sc01 ~]# crushtool -c /tmp/crush.txt -o /tmp/crush.new /tmp/crush.txt:52 error: parse error at '' Beside that we think that ceph is very promising and we would like to contribute to bring this filesystem to a production quality, at least with extensive testing. thanx a lot -- Stefan Majer ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel