Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000
Hi Tom, > We have been seeing this same behavior on a cluster that has been perfectly > happy until we upgraded to the ubuntu vivid 3.19 kernel. We are in the i can't recall when we gave 3.19 a shot but now that you say it... The cluster was happy for >9 months with 3.16. Did you try 4.2 or do you think the regression from 3.16 introduced somewhere trough 3.19 is still in 4.2? Thx! Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000
Hi Tom, 2015-12-08 10:34 GMT+01:00 Tom Christensen: > We didn't go forward to 4.2 as its a large production cluster, and we just > needed the problem fixed. We'll probably test out 4.2 in the next couple unfortunately we don't have the luxury of a test cluster. and to add to that, we couldnt simulate the load, altough it does not seem to be load related. Did you try running with nodeep-scrub as a short-term workaround? I'll give ~30% of the nodes 4.2 and see how it goes. > In our experience it takes about 2 weeks to start happening we're well below that. Somewhat between 1 and 4 days. And yes, once one goes south, it affects the rest of the cluster. Thx! Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000
Hello Cephers, lately, our ceph-cluster started to show some weird behavior: the osd boxes show a load of 5000-15000 before the osds get marked down. Usually the box is fully usable, even "apt-get dist-upgrade" runs smoothly, you can read and write to any disk, only things you can't do are strace the osd processes, sync or reboot. we only find some logs about the "xfsaild = XFS Access Item List Daemon" as hung_task warnings. Dec 7 15:36:32 ceph1-store204 kernel: [152066.016108] [] ? kthread_create_on_node+0x1c0/0x1c0 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016112] INFO: task xfsaild/dm-1:1445 blocked for more than 120 seconds. Dec 7 15:36:32 ceph1-store204 kernel: [152066.016329] Tainted: G C 3.19.0-39-generic #44~14.04.1-Ubuntu Dec 7 15:36:32 ceph1-store204 kernel: [152066.016558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 7 15:36:32 ceph1-store204 kernel: [152066.016802] xfsaild/dm-1 D 8807faa03af8 0 1445 2 0x Dec 7 15:36:32 ceph1-store204 kernel: [152066.016805] 8807faa03af8 8808098989d0 00013e80 8807faa03fd8 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016808] 00013e80 88080bb775c0 8808098989d0 88011381b2a8 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016812] 8807faa03c50 7fff 8807faa03c48 8808098989d0 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016815] Call Trace: Dec 7 15:36:32 ceph1-store204 kernel: [152066.016819] [] schedule+0x29/0x70 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016823] [] schedule_timeout+0x20c/0x280 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016826] [] ? sched_clock_cpu+0x85/0xc0 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016830] [] ? try_to_wake_up+0x1f1/0x340 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016834] [] wait_for_completion+0xa4/0x170 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016836] [] ? wake_up_state+0x20/0x20 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016840] [] flush_work+0xed/0x1c0 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016846] [] ? destroy_worker+0x90/0x90 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016870] [] xlog_cil_force_lsn+0x7e/0x1f0 [xfs] Dec 7 15:36:32 ceph1-store204 kernel: [152066.016873] [] ? lock_timer_base.isra.36+0x2b/0x50 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016878] [] ? try_to_del_timer_sync+0x4f/0x70 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016901] [] _xfs_log_force+0x60/0x270 [xfs] Dec 7 15:36:32 ceph1-store204 kernel: [152066.016904] [] ? internal_add_timer+0x80/0x80 Dec 7 15:36:32 ceph1-store204 kernel: [152066.016926] [] xfs_log_force+0x2a/0x90 [xfs] Dec 7 15:36:32 ceph1-store204 kernel: [152066.016948] [] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Dec 7 15:36:32 ceph1-store204 kernel: [152066.016970] [] xfsaild+0x140/0x5a0 [xfs] Dec 7 15:36:32 ceph1-store204 kernel: [152066.016992] [] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Dec 7 15:36:32 ceph1-store204 kernel: [152066.016996] [] kthread+0xd2/0xf0 Dec 7 15:36:32 ceph1-store204 kernel: [152066.017000] [] ? kthread_create_on_node+0x1c0/0x1c0 Dec 7 15:36:32 ceph1-store204 kernel: [152066.017005] [] ret_from_fork+0x58/0x90 Dec 7 15:36:32 ceph1-store204 kernel: [152066.017009] [] ? kthread_create_on_node+0x1c0/0x1c0 Dec 7 15:36:32 ceph1-store204 kernel: [152066.017013] INFO: task xfsaild/dm-6:1616 blocked for more than 120 seconds. kswapd is also reported as hung, but we don't have swap on the osds. It looks like either all ceph-osd-threads are reporting in as willing to work, or it's the xfs-maintenance-process itself like described in [1,2] Usually if we aint fast enough setting no{out,scrub,deep-scrub} this has an avalanche effect where we usually end up ipmi-power-cycling half of the cluster because all the osd-nodes are busy doing nothing (according to iostat or top, exept the load). Is this a known bug for kernel 3.19.0-39 (ubuntu 14.04 with the vivid kernel)? Do the xfs-tweaks described here https://www.mail-archive.com/ceph-users@lists.ceph.com/msg25295.html (i know this is for a pull request modifying the write-paths) look decent or worth a try? Currently we're running with "back to defaults" and less load (desperate try with the filestore settings, didnt change anything) ceph.conf-osd section: [osd] filestore max sync interval = 15 filestore min sync interval = 1 osd max backfills = 1 osd recovery op priority = 1 as a baffled try to get it to survive more than a day at a stretch. Maybe kernel 4.2 is worth a try? Thx for any input Benedikt [1] https://www.reddit.com/r/linux/comments/18kvdb/xfsaild_is_creating_tons_of_system_threads_and/ [2] http://serverfault.com/questions/497049/the-xfs-filesystem-is-broken-in-rhel-centos-6-x-what-can-i-do-about-it ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000
Hi Jan, we initially had to bump it once we had more than 12 osds per box. But it'll change that to the values you provided. Thx! Benedikt 2015-12-08 8:15 GMT+01:00 Jan Schermer <j...@schermer.cz>: > What is the setting of sysctl kernel.pid_max? > You relly need to have this: > kernel.pid_max = 4194304 > (I think it also sets this as well: kernel.threads-max = 4194304) > > I think you are running out of processs IDs. > > Jan > >> On 08 Dec 2015, at 08:10, Benedikt Fraunhofer <fraunho...@traced.net> wrote: >> >> Hello Cephers, >> >> lately, our ceph-cluster started to show some weird behavior: >> >> the osd boxes show a load of 5000-15000 before the osds get marked down. >> Usually the box is fully usable, even "apt-get dist-upgrade" runs smoothly, >> you can read and write to any disk, only things you can't do are strace the >> osd >> processes, sync or reboot. >> >> we only find some logs about the "xfsaild = XFS Access Item List Daemon" >> as hung_task warnings. >> >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016108] >> [] ? kthread_create_on_node+0x1c0/0x1c0 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016112] INFO: task >> xfsaild/dm-1:1445 blocked for more than 120 seconds. >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016329] Tainted: >> G C 3.19.0-39-generic #44~14.04.1-Ubuntu >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016558] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016802] xfsaild/dm-1 >> D 8807faa03af8 0 1445 2 0x >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016805] >> 8807faa03af8 8808098989d0 00013e80 8807faa03fd8 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016808] >> 00013e80 88080bb775c0 8808098989d0 88011381b2a8 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016812] >> 8807faa03c50 7fff 8807faa03c48 8808098989d0 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016815] Call Trace: >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016819] >> [] schedule+0x29/0x70 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016823] >> [] schedule_timeout+0x20c/0x280 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016826] >> [] ? sched_clock_cpu+0x85/0xc0 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016830] >> [] ? try_to_wake_up+0x1f1/0x340 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016834] >> [] wait_for_completion+0xa4/0x170 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016836] >> [] ? wake_up_state+0x20/0x20 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016840] >> [] flush_work+0xed/0x1c0 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016846] >> [] ? destroy_worker+0x90/0x90 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016870] >> [] xlog_cil_force_lsn+0x7e/0x1f0 [xfs] >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016873] >> [] ? lock_timer_base.isra.36+0x2b/0x50 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016878] >> [] ? try_to_del_timer_sync+0x4f/0x70 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016901] >> [] _xfs_log_force+0x60/0x270 [xfs] >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016904] >> [] ? internal_add_timer+0x80/0x80 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016926] >> [] xfs_log_force+0x2a/0x90 [xfs] >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016948] >> [] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016970] >> [] xfsaild+0x140/0x5a0 [xfs] >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016992] >> [] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.016996] >> [] kthread+0xd2/0xf0 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.017000] >> [] ? kthread_create_on_node+0x1c0/0x1c0 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.017005] >> [] ret_from_fork+0x58/0x90 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.017009] >> [] ? kthread_create_on_node+0x1c0/0x1c0 >> Dec 7 15:36:32 ceph1-store204 kernel: [152066.017013] INFO: task >> xfsaild/dm-6:1616 blocked for more than 120 seconds. >> >> kswapd is also reported as hung, but we don't have swap on the osds. >> >> It looks like either all ceph-osd-threads are reporting in as willing to >> work, >> or it's the xfs-maintenance-process itself like described in [1,2] >> >> Usually i
Re: [ceph-users] after loss of journal, osd fails to start with failed assert OSDMapRef OSDService::get_map(epoch_t) ret != null
Hi Jan, 2015-12-08 8:12 GMT+01:00 Jan Schermer: > Journal doesn't just "vanish", though, so you should investigate further... We tried putting journals as files to overcome the changes in ceph-deploy where you can't have the journals unencrypted but only the disks itself. (and/or you can't have the journals on an msdos-fdisk-thing disk, just gpt, but the debian installier can't handle gpt) (this worked when we started but was changed later) After a crash [1] this file just wasn't there any longer. > This log is from the new empty journal, right? Yep. We're slowly migrating away from the journal-as-file deployment; I just thought that it should be able to start up with an empty journal without dying with an assertion-failure. Thx in advance Benedikt [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-December/006593.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000
Hi Jan, we had 65k for pid_max, which made kernel.threads-max = 1030520. or kernel.threads-max = 256832 (looks like it depends on the number of cpus?) currently we've root@ceph1-store209:~# sysctl -a | grep -e thread -e pid kernel.cad_pid = 1 kernel.core_uses_pid = 0 kernel.ns_last_pid = 60298 kernel.pid_max = 65535 kernel.threads-max = 256832 vm.nr_pdflush_threads = 0 root@ceph1-store209:~# ps axH |wc -l 17548 we'll see how it behaves once puppet has come by and adjusted it. Thx! Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd dies on pg repair with FAILED assert(!out->snaps.empty())
Hello Cephers! trying to repair an inconsistent PG results in the osd dying with an assertion failure: 0> 2015-12-01 07:22:13.398006 7f76d6594700 -1 osd/SnapMapper.cc: In function 'int SnapMapper::get_snaps(const hobject_t& , SnapMapper::object_snaps*)' thread 7f76d6594700 time 2015-12-01 07:22:13.394900 osd/SnapMapper.cc: 153: FAILED assert(!out->snaps.empty()) ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb] 2: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0x40c) [0x72aecc] 3: (SnapMapper::get_snaps(hobject_t const&, std::set*)+0xa2) [0x72 b062] 4: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84] 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8] 6: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0] 7: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e] 8: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9] 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 10: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 11: (()+0x8182) [0x7f76fe072182] 12: (clone()+0x6d) [0x7f76fc5dd47d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 1 max_new 1000 log_file /var/log/ceph/ceph-osd.339.log --- end dump of recent events --- 2015-12-01 07:22:13.476525 7f76d6594700 -1 *** Caught signal (Aborted) ** in thread 7f76d6594700 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: /usr/bin/ceph-osd() [0xacd7ba] 2: (()+0x10340) [0x7f76fe07a340] 3: (gsignal()+0x39) [0x7f76fc519cc9] 4: (abort()+0x148) [0x7f76fc51d0d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f76fce24535] 6: (()+0x5e6d6) [0x7f76fce226d6] 7: (()+0x5e703) [0x7f76fce22703] 8: (()+0x5e922) [0x7f76fce22922] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc62d8] 10: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0x40c) [0x72aecc] 11: (SnapMapper::get_snaps(hobject_t const&, std::set *)+0xa2) [0x72b062] 12: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84] 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8] 14: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0] 15: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e] 16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9] 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 18: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 19: (()+0x8182) [0x7f76fe072182] 20: (clone()+0x6d) [0x7f76fc5dd47d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -4> 2015-12-01 07:22:13.403280 7f76e4db1700 1 -- 10.9.246.104:6887/8548 <== osd.109 10.9.245.204:0/3407 13 osd_ping(ping e320057 stamp 2015-12-01 07:22:13.399779) v2 47+0+0 (1340520147 0 0) 0x22456800 con 0x22340b00 -3> 2015-12-01 07:22:13.403313 7f76e4db1700 1 -- 10.9.246.104:6887/8548 --> 10.9.245.204:0/3407 -- osd_ping(ping_reply e320057 stamp 2015-12-01 07:22:13.399779) v2 -- ?+0 0x23e3be00 con 0x22340b00 -2> 2015-12-01 07:22:13.403365 7f76e35ae700 1 -- 10.9.246.104:6883/8548 <== osd.109 10.9.245.204:0/3407 13 osd_ping(ping e320057 stamp 2015-12-01 07:22:13.399779) v2 47+0+0 (1340520147 0 0) 0x22457600 con 0x22570d60 -1> 2015-12-01 07:22:13.403405 7f76e35ae700 1 -- 10.9.246.104:6883/8548 --> 10.9.245.204:0/3407 -- osd_ping(ping_reply e320057 stamp 2015-12-01 07:22:13.399779) v2 -- ?+0 0x23e3fe00 con 0x22570d60 0> 2015-12-01 07:22:13.476525 7f76d6594700 -1 *** Caught signal (Aborted) ** in thread 7f76d6594700 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: /usr/bin/ceph-osd() [0xacd7ba] 2: (()+0x10340) [0x7f76fe07a340] 3: (gsignal()+0x39) [0x7f76fc519cc9] 4: (abort()+0x148) [0x7f76fc51d0d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f76fce24535] 6: (()+0x5e6d6)
[ceph-users] after loss of journal, osd fails to start with failed assert OSDMapRef OSDService::get_map(epoch_t) ret != null
Hello List, after some crash of a box, the journal vanished. Creating a new one with --mkjournal results in the osd beeing unable to start. Does anyone want to dissect this any further or should I just trash the osd and recreate it? Thx in advance Benedikt 2015-12-01 07:46:31.505255 7fadb7f1e900 0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 5486 2015-12-01 07:46:31.628585 7fadb7f1e900 0 filestore(/var/lib/ceph/osd/ceph-328) backend xfs (magic 0x58465342) 2015-12-01 07:46:31.662972 7fadb7f1e900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-328) detect_features: FIEMAP ioctl is supported and appears to work 2015-12-01 07:46:31.662984 7fadb7f1e900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-328) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-12-01 07:46:31.674999 7fadb7f1e900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-328) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-12-01 07:46:31.675071 7fadb7f1e900 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-328) detect_feature: extsize is supported and kernel 3.19.0-33-generic >= 3.5 2015-12-01 07:46:31.806490 7fadb7f1e900 0 filestore(/var/lib/ceph/osd/ceph-328) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-12-01 07:46:35.598698 7fadb7f1e900 1 journal _open /var/lib/ceph/osd/ceph-328/journal fd 19: 9663676416 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-12-01 07:46:35.600956 7fadb7f1e900 1 journal _open /var/lib/ceph/osd/ceph-328/journal fd 19: 9663676416 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-12-01 07:46:35.619860 7fadb7f1e900 0 cls/hello/cls_hello.cc:271: loading cls_hello 2015-12-01 07:46:35.682532 7fadb7f1e900 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fadb7f1e900 time 2015-12-01 07:46:35.681204 osd/OSD.h: 716: FAILED assert(ret) ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb] 2: (OSDService::get_map(unsigned int)+0x3f) [0x70ad5f] 3: (OSD::init()+0x6ad) [0x6c5e0d] 4: (main()+0x2860) [0x6527e0] 5: (__libc_start_main()+0xf5) [0x7fadb505bec5] 6: /usr/bin/ceph-osd() [0x66b887] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -62> 2015-12-01 07:46:31.503728 7fadb7f1e900 5 asok(0x5402000) register_command perfcounters_dump hook 0x53a2050 -61> 2015-12-01 07:46:31.503759 7fadb7f1e900 5 asok(0x5402000) register_command 1 hook 0x53a2050 -60> 2015-12-01 07:46:31.503764 7fadb7f1e900 5 asok(0x5402000) register_command perf dump hook 0x53a2050 -59> 2015-12-01 07:46:31.503768 7fadb7f1e900 5 asok(0x5402000) register_command perfcounters_schema hook 0x53a2050 -58> 2015-12-01 07:46:31.503772 7fadb7f1e900 5 asok(0x5402000) register_command 2 hook 0x53a2050 -57> 2015-12-01 07:46:31.503775 7fadb7f1e900 5 asok(0x5402000) register_command perf schema hook 0x53a2050 -56> 2015-12-01 07:46:31.503786 7fadb7f1e900 5 asok(0x5402000) register_command perf reset hook 0x53a2050 -55> 2015-12-01 07:46:31.503790 7fadb7f1e900 5 asok(0x5402000) register_command config show hook 0x53a2050 -54> 2015-12-01 07:46:31.503792 7fadb7f1e900 5 asok(0x5402000) register_command config set hook 0x53a2050 -53> 2015-12-01 07:46:31.503797 7fadb7f1e900 5 asok(0x5402000) register_command config get hook 0x53a2050 -52> 2015-12-01 07:46:31.503799 7fadb7f1e900 5 asok(0x5402000) register_command config diff hook 0x53a2050 -51> 2015-12-01 07:46:31.503802 7fadb7f1e900 5 asok(0x5402000) register_command log flush hook 0x53a2050 -50> 2015-12-01 07:46:31.503804 7fadb7f1e900 5 asok(0x5402000) register_command log dump hook 0x53a2050 -49> 2015-12-01 07:46:31.503807 7fadb7f1e900 5 asok(0x5402000) register_command log reopen hook 0x53a2050 -48> 2015-12-01 07:46:31.505255 7fadb7f1e900 0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 5486 -47> 2015-12-01 07:46:31.619430 7fadb7f1e900 1 -- 10.9.246.104:0/0 learned my addr 10.9.246.104:0/0 -46> 2015-12-01 07:46:31.619439 7fadb7f1e900 1 accepter.accepter.bind my_inst.addr is 10.9.246.104:6821/5486 need_addr=0 -45> 2015-12-01 07:46:31.619457 7fadb7f1e900 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6824/5486 need_addr=1 -44> 2015-12-01 07:46:31.619473 7fadb7f1e900 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6825/5486 need_addr=1 -43> 2015-12-01 07:46:31.619492 7fadb7f1e900 1 -- 10.9.246.104:0/0 learned my addr 10.9.246.104:0/0 -42> 2015-12-01 07:46:31.619496 7fadb7f1e900 1 accepter.accepter.bind my_inst.addr is 10.9.246.104:6827/5486 need_addr=0 -41> 2015-12-01 07:46:31.620890 7fadb7f1e900 5 asok(0x5402000) init /var/run/ceph/ceph-osd.328.asok -40> 2015-12-01 07:46:31.620901 7fadb7f1e900 5 asok(0x5402000) bind_and_listen
Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000
Hi Jan, > Doesn't look near the limit currently (but I suppose you rebooted it in the > meantime?). the box this numbers came from has an uptime of 13 days so it's one of the boxes that did survive yesterdays half-cluster-wide-reboot. > Did iostat say anything about the drives? (btw dm-1 and dm-6 are what? Is > that your data drives?) - were they overloaded really? no they didn't have any load and or iops. Basically the whole box had nothing to do. If I understand the load correctly, this just reports threads that are ready and willing to work but - in this case - don't get any data to work with. Thx Benedikt 2015-12-08 8:44 GMT+01:00 Jan Schermer <j...@schermer.cz>: > > Jan > > >> On 08 Dec 2015, at 08:41, Benedikt Fraunhofer <fraunho...@traced.net> wrote: >> >> Hi Jan, >> >> we had 65k for pid_max, which made >> kernel.threads-max = 1030520. >> or >> kernel.threads-max = 256832 >> (looks like it depends on the number of cpus?) >> >> currently we've >> >> root@ceph1-store209:~# sysctl -a | grep -e thread -e pid >> kernel.cad_pid = 1 >> kernel.core_uses_pid = 0 >> kernel.ns_last_pid = 60298 >> kernel.pid_max = 65535 >> kernel.threads-max = 256832 >> vm.nr_pdflush_threads = 0 >> root@ceph1-store209:~# ps axH |wc -l >> 17548 >> >> we'll see how it behaves once puppet has come by and adjusted it. >> >> Thx! >> >> Benedikt > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to improve single thread sequential reads?
Hi Nick, did you do anything fancy to get to ~90MB/s in the first place? I'm stuck at ~30MB/s reading cold data. single-threaded-writes are quite speedy, around 600MB/s. radosgw for cold data is around the 90MB/s, which is imho limitted by the speed of a single disk. Data already present on the osd-os-buffers arrive with around 400-700MB/s so I don't think the network is the culprit. (20 node cluster, 12x4TB 7.2k disks, 2 ssds for journals for 6 osds each, lacp 2x10g bonds) rados bench single-threaded performs equally bad, but with its default multithreaded settings it generates wonderful numbers, usually only limiited by linerate and/or interrupts/s. I just gave kernel 4.0 with its rbd-blk-mq feature a shot, hoping to get to your wonderful numbers, but it's staying below 30 MB/s. I was thinking about using a software raid0 like you did but that's imho really ugly. When I know I needed something speedy, I usually just started dd-ing the file to /dev/null and wait for about three minutes before starting the actual job; some sort of hand-made read-ahead for dummies. Thx in advance Benedikt 2015-08-17 13:29 GMT+02:00 Nick Fisk n...@fisk.me.uk: Thanks for the replies guys. The client is set to 4MB, I haven't played with the OSD side yet as I wasn't sure if it would make much difference, but I will give it a go. If the client is already passing a 4MB request down through to the OSD, will it be able to readahead any further? The next 4MB object in theory will be on another OSD and so I'm not sure if reading ahead any further on the OSD side would help. How I see the problem is that the RBD client will only read 1 OSD at a time as the RBD readahead can't be set any higher than max_hw_sectors_kb, which is the object size of the RBD. Please correct me if I'm wrong on this. If you could set the RBD readahead to much higher than the object size, then this would probably give the desired effect where the buffer could be populated by reading from several OSD's in advance to give much higher performance. That or wait for striping to appear in the Kernel client. I've also found that BareOS (fork of Bacula) seems to has a direct RADOS feature that supports radosstriper. I might try this and see how it performs as well. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Somnath Roy Sent: 17 August 2015 03:36 To: Alex Gorbachev a...@iss-integration.com; Nick Fisk n...@fisk.me.uk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to improve single thread sequential reads? Have you tried setting read_ahead_kb to bigger number for both client/OSD side if you are using krbd ? In case of librbd, try the different config options for rbd cache.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alex Gorbachev Sent: Sunday, August 16, 2015 7:07 PM To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to improve single thread sequential reads? Hi Nick, On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nick Fisk Sent: 13 August 2015 18:04 To: ceph-users@lists.ceph.com Subject: [ceph-users] How to improve single thread sequential reads? Hi, I'm trying to use a RBD to act as a staging area for some data before pushing it down to some LTO6 tapes. As I cannot use striping with the kernel client I tend to be maxing out at around 80MB/s reads testing with DD. Has anyone got any clever suggestions of giving this a bit of a boost, I think I need to get it up to around 200MB/s to make sure there is always a steady flow of data to the tape drive. I've just tried the testing kernel with the blk-mq fixes in it for full size IO's, this combined with bumping readahead up to 4MB, is now getting me on average 150MB/s to 200MB/s so this might suffice. On a personal interest, I would still like to know if anyone has ideas on how to really push much higher bandwidth through a RBD. Some settings in our ceph.conf that may help: osd_op_threads = 20 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k filestore_queue_max_ops = 9 filestore_flusher = false filestore_max_sync_interval = 10 filestore_sync_flush = false Regards, Alex Rbd-fuse seems to top out at 12MB/s, so there goes that option. I'm thinking mapping multiple RBD's and then combining them into a mdadm RAID0 stripe might work, but seems a bit messy. Any suggestions? Thanks, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] How to calculate file size when mount a block device from rbd image
Hi Mika, 2014-10-20 11:16 GMT+02:00 Vickie CH mika.leaf...@gmail.com: 2.Use dd command to create a 1.2T file. #dd if=/dev/zero of=/mnt/ceph-mount/test12T bs=1M count=12288000 I think you're off by one zero 12288000/1024/1024 11 Means you're instructing it to create a 11TB file on a 1.5T volume. Cheers Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 2 pgs stuck in active+clean+inconsistent
Hello List, the other day when i looked at our ceph cluster it showed: health HEALTH_ERR 135 pgs inconsistent; 1 pgs recovering; recovery 76/4633296 objects degraded (0.002%); 169 scrub errors; clock skew detected on mon.mon2-nb8 I did a ceph pg dump | grep -i incons | cut -f 1 | while read a; do ceph pg repair $a done to get rid of most of these, but 2 remained; over night it scrubbed (i think) and raised it to 3: 2014-06-06 03:23:53.462918 mon.0 [INF] pgmap v2623164: 10640 pgs: 10638 active+clean, 2 active+clean+inconsistent; 5657 GB data, 17068 GB used, 332 TB / 349 TB avail 2014-06-06 03:22:06.209085 osd.90 [INF] 27.58 scrub ok 2014-06-06 03:22:17.251617 osd.32 [ERR] 2.126 shard 12: soid ec653126/rb.0.11d90.238e1f29.083e/head//2 digest 1668941108 != known digest 3542109454 2014-06-06 03:22:17.251929 osd.32 [ERR] 2.126 deep-scrub 0 missing, 1 inconsistent objects 2014-06-06 03:22:17.251994 osd.32 [ERR] 2.126 deep-scrub 1 errors 2014-06-06 03:23:54.471206 mon.0 [INF] pgmap v2623165: 10640 pgs: 10637 active+clean, 2 active+clean+inconsistent, 1 active+clean+scrubbing; 5657 GB data, 17068 GB used, 332 TB / 349 TB avail the osd hosts have the same uptime and unfortunately the logrotate deleted the logs before that initially showed up. I only found a post about mismatched sizes and how to fix that with --truncate, not digests. The host holding osd.32 is happy in its dmesg and smart looks fine to me for this disk. the current state of the cluster is health HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; clock skew detected on mon.mon1-nb8, mon.mon2-nb8 and it logs nothing in ceph -w when i issue ceph pg repair 2.c1 instructing pg 2.c1 on osd.51 to repair ceph pg repair 2.68 instructing pg 2.68 on osd.69 to repair Could you help me troubleshoot that? Thx Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 2 pgs stuck in active+clean+inconsistent
2014-06-06 9:18 GMT+02:00 Benedikt Fraunhofer given.to.lists.ceph-users.ceph.com.toasta@traced.net: Hello List, and it logs nothing in ceph -w when i issue ceph pg repair 2.c1 instructing pg 2.c1 on osd.51 to repair ceph pg repair 2.68 instructing pg 2.68 on osd.69 to repair Rebooting the hosts holding those osds made them cooperative, accepting the command and the warning go away. I guess a restart of the osd-daemons would've been enough, i just was too lazy to figure out how to stop one specific osd and there were some updates pending :) This is ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw multipart-uploaded downloads fail
2014-04-04 0:31 GMT+02:00 Yehuda Sadeh yeh...@inktank.com: Hi Yehuda, sorry for the delay. We ran into another problem and this took up all the time. Are you running the version off the master branch, or did you just cherry-pick the patch? I can't seem to reproduce the problem. I just patched that line in and gave it a try. Besides my attempts to get the civetweb thing cooperating, my tree should be at the revision that was tagged with v0.78. The file vanished after restarting radosgw, so i took it as a race-condition or as a result of a wrong upload. Happy to hear you could sort that one out as well. I've another problem, this time caused by rather large-ish files: The final part returned 416 (InvalidRange) /g969eed92-047e-41c9-a49b-234671afae18_d44abd - s3://7aecc33d-d3c7-4538-bb59-c0717c06aad9/969eed92-047e-41c9-a49b-234671afae18_d44abd [part 67372 of 67373, 15MB] 15728640 of 15728640 100% in0s21.64 MB/s done /g969eed92-047e-41c9-a49b-234671afae18_d44abd - s3://7aecc33d-d3c7-4538-bb59-c0717c06aad9/969eed92-047e-41c9-a49b-234671afae18_d44abd [part 67373 of 67373, 4MB] 5044856 of 5044856 100% in0s20.04 MB/s done ERROR: S3 error: 416 (InvalidRange): The file was around 986 gigabytes in size and was splittet into 67372 15mb parts and one remaining 4mb part. I'm currently trying to find a faster box with enough free space to reproduce that and capture logs. Thx Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw multipart-uploaded downloads fail
Hi Yehuda, i tried your patch and it feels fine, except you might need some special handling for those already corrupt uploads, as trying to delete them gets radosgw in an endless loop and high cpu usage: 2014-04-02 11:03:15.045627 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045628 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045629 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045631 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045632 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045634 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045634 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045636 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045637 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045639 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045639 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045641 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045642 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045644 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045644 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045646 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045647 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045649 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045649 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045651 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045652 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045654 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045654 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045656 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045657 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045659 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045660 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045661 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045662 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045664 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045665 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045667 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045667 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045669 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 2014-04-02 11:03:15.045670 7fbf157d2700 0 RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432 part_ofs=33554432 rule-part_size=0 2014-04-02 11:03:15.045672 7fbf157d2700 20 RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1 Thx Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw multipart-uploaded downloads fail
Hello everyone, I can't download anything that's been uploaded as a multipart-upload. I'm on 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) on a non-ec-pool. The upload is acknowledged as beeing ok 2014-03-31 14:56:56.722727 7f4080ff9700 2 req 8:0.023285:s3:POST /file1:complete_multipart:http status=200 2014-03-31 14:56:56.722736 7f4080ff9700 1 == req done req=0x7f407800b180 http_status=200 == , but on download, some error is translated to 404. 2014-03-31 14:56:57.885610 7f408cef8700 20 rados-get_obj_iterate_cb oid=default.71025.1__multipart_file1.2/F9cdVNlUTdfYq7M9cTuQjYwbsS41Y5j.16 obj-ofs=61865984 read_ofs=0 len=4194304 2014-03-31 14:56:57.885635 7f408cef8700 20 rados-aio_operate r=0 bl.length=0 2014-03-31 14:56:57.885639 7f408cef8700 20 RGWObjManifest::operator++(): rule-part_size=4194304 rules.size()=2 2014-03-31 14:56:57.885640 7f408cef8700 20 RGWObjManifest::operator++(): stripe_ofs=66060288 part_ofs=61865984 rule-part_size=4194304 2014-03-31 14:56:57.885642 7f408cef8700 0 RGWObjManifest::operator++(): result: ofs=66060288 stripe_ofs=66060288 part_ofs=66060288 rule-part_size=4194304 2014-03-31 14:56:57.885737 7f40f27fc700 20 get_obj_aio_completion_cb: io completion ofs=49283072 len=4194304 2014-03-31 14:56:57.885746 7f408cef8700 20 rados-get_obj_iterate_cb oid=default.71025.1__multipart_file1.2/F9cdVNlUTdfYq7M9cTuQjYwbsS41Y5j.17 obj-ofs=66060288 read_ofs=0 len=1048576 2014-03-31 14:56:57.885770 7f408cef8700 20 rados-aio_operate r=0 bl.length=0 2014-03-31 14:56:57.885772 7f408cef8700 20 RGWObjManifest::operator++(): rule-part_size=4194304 rules.size()=2 2014-03-31 14:56:57.885773 7f408cef8700 20 RGWObjManifest::operator++(): stripe_ofs=70254592 part_ofs=66060288 rule-part_size=4194304 2014-03-31 14:56:57.885774 7f408cef8700 0 RGWObjManifest::operator++(): result: ofs=67108864 stripe_ofs=67108864 part_ofs=70254592 rule-part_size=4194304 2014-03-31 14:56:57.885786 7f408cef8700 10 get_obj_iterate() r=-2, canceling all io 2014-03-31 14:56:57.885787 7f408cef8700 20 get_obj_data::cancel_all_io() 2014-03-31 14:56:57.885822 7f408cef8700 2 req 9:0.081262:s3:GET /file1:get_obj:http status=404 2014-03-31 14:56:57.885837 7f408cef8700 1 == req done req=0x7f4078011510 http_status=404 == 2014-03-31 14:56:57.885839 7f408cef8700 20 process_request() returned -2 2014-03-31 14:56:57.885903 7f40f27fc700 20 get_obj_aio_completion_cb: io completion ofs=53477376 len=4194304 2014-03-31 14:56:57.885958 7f40f27fc700 20 get_obj_aio_completion_cb: io completion ofs=57671680 len=4194304 2014-03-31 14:56:57.886566 7f40f27fc700 20 get_obj_aio_completion_cb: io completion ofs=66060288 len=1048576 2014-03-31 14:56:57.886636 7f40f27fc700 20 get_obj_aio_completion_cb: io completion ofs=61865984 len=4194304 everything is working fine, if i disable the multipart-upload; here's the script I used to test it: --- #! /bin/bash set -x FN=file1 SIZE=64 S3CMD=s3cmd S3CMD=/home/bf/s3cmd-1.5.0-beta1/s3cmd MULTIPART=--disable-multipart MULTIPART= BUCK=s3://$(uuidgen)/ dd if=/dev/zero of=$FN bs=4M count=$(($SIZE/4)) $S3CMD mb $BUCK wait $S3CMD put $MULTIPART $FN $BUCK sleep 1 $S3CMD get --force ${BUCK}${FN} file1.1 --- Is this s3cmd's fault or is radosgw doing something wrong? Thx in advance Benedikt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw multipart-uploaded downloads fail
Hi Yehuda, 2014-04-01 15:49 GMT+02:00 Yehuda Sadeh yeh...@inktank.com: It could be the gateway's fault, might be related to the new manifest that went in before 0.78. I'll need more logs though, can you reproduce with 'debug ms = 1', and 'debug rgw = 20', and provide a log for all the upload and for the download? that's what i thought, too, at least it it talks about the manifest in the logs. I've attached one with debug rgw 20 but without ms 1, maybe that's already of help. If you need the ms 1, I can generate that tomorrow. Thx Benedikt radosgw11.log.gz Description: GNU Zip compressed data ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com