Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Benedikt Fraunhofer
Hi Tom,

> We have been seeing this same behavior on a cluster that has been perfectly
> happy until we upgraded to the ubuntu vivid 3.19 kernel.  We are in the

i can't recall when we gave 3.19 a shot but now that you say it... The
cluster was happy for >9 months with 3.16.
Did you try 4.2 or do you think the regression from 3.16 introduced
somewhere trough 3.19 is still in 4.2?

Thx!
   Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Benedikt Fraunhofer
Hi Tom,

2015-12-08 10:34 GMT+01:00 Tom Christensen :

> We didn't go forward to 4.2 as its a large production cluster, and we just
> needed the problem fixed.  We'll probably test out 4.2 in the next couple

unfortunately we don't have the luxury of a test cluster.
and to add to that, we couldnt simulate the load, altough it does not
seem to be load related.
Did you try running with nodeep-scrub as a short-term workaround?

I'll give ~30% of the nodes 4.2 and see how it goes.

> In our experience it takes about 2 weeks to start happening

we're well below that. Somewhat between 1 and 4 days.
And yes, once one goes south, it affects the rest of the cluster.

Thx!

 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Benedikt Fraunhofer
Hello Cephers,

lately, our ceph-cluster started to show some weird behavior:

the osd boxes show a load of 5000-15000 before the osds get marked down.
Usually the box is fully usable, even "apt-get dist-upgrade" runs smoothly,
you can read and write to any disk, only things you can't do are strace the osd
processes, sync or reboot.

we only find some logs about the "xfsaild = XFS Access Item List Daemon"
as hung_task warnings.

Dec  7 15:36:32 ceph1-store204 kernel: [152066.016108]
[] ? kthread_create_on_node+0x1c0/0x1c0
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016112] INFO: task
xfsaild/dm-1:1445 blocked for more than 120 seconds.
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016329]   Tainted:
G C 3.19.0-39-generic #44~14.04.1-Ubuntu
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016558] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016802] xfsaild/dm-1
D 8807faa03af8 0  1445  2 0x
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016805]
8807faa03af8 8808098989d0 00013e80 8807faa03fd8
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016808]
00013e80 88080bb775c0 8808098989d0 88011381b2a8
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016812]
8807faa03c50 7fff 8807faa03c48 8808098989d0
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016815] Call Trace:
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016819]
[] schedule+0x29/0x70
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016823]
[] schedule_timeout+0x20c/0x280
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016826]
[] ? sched_clock_cpu+0x85/0xc0
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016830]
[] ? try_to_wake_up+0x1f1/0x340
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016834]
[] wait_for_completion+0xa4/0x170
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016836]
[] ? wake_up_state+0x20/0x20
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016840]
[] flush_work+0xed/0x1c0
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016846]
[] ? destroy_worker+0x90/0x90
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016870]
[] xlog_cil_force_lsn+0x7e/0x1f0 [xfs]
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016873]
[] ? lock_timer_base.isra.36+0x2b/0x50
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016878]
[] ? try_to_del_timer_sync+0x4f/0x70
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016901]
[] _xfs_log_force+0x60/0x270 [xfs]
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016904]
[] ? internal_add_timer+0x80/0x80
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016926]
[] xfs_log_force+0x2a/0x90 [xfs]
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016948]
[] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016970]
[] xfsaild+0x140/0x5a0 [xfs]
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016992]
[] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Dec  7 15:36:32 ceph1-store204 kernel: [152066.016996]
[] kthread+0xd2/0xf0
Dec  7 15:36:32 ceph1-store204 kernel: [152066.017000]
[] ? kthread_create_on_node+0x1c0/0x1c0
Dec  7 15:36:32 ceph1-store204 kernel: [152066.017005]
[] ret_from_fork+0x58/0x90
Dec  7 15:36:32 ceph1-store204 kernel: [152066.017009]
[] ? kthread_create_on_node+0x1c0/0x1c0
Dec  7 15:36:32 ceph1-store204 kernel: [152066.017013] INFO: task
xfsaild/dm-6:1616 blocked for more than 120 seconds.

kswapd is also reported as hung, but we don't have swap on the osds.

It looks like either all ceph-osd-threads are reporting in as willing to work,
or it's the xfs-maintenance-process itself like described in [1,2]

Usually if we aint fast enough setting no{out,scrub,deep-scrub} this
has an avalanche
effect where we usually end up ipmi-power-cycling half of the cluster
because all the osd-nodes
are busy doing nothing (according to iostat or top, exept the load).

Is this a known bug for kernel 3.19.0-39 (ubuntu 14.04 with the vivid kernel)?
Do the xfs-tweaks described here
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg25295.html
(i know this is for a pull request modifying the write-paths)
look decent or worth a try?

Currently we're running with "back to defaults" and less load
(desperate try with the filestore settings, didnt change anything)
ceph.conf-osd section:

[osd]
  filestore max sync interval = 15
  filestore min sync interval = 1
  osd max backfills = 1
  osd recovery op priority = 1


as a baffled try to get it to survive more than a day at a stretch.

Maybe kernel 4.2 is worth a try?

Thx for any input
 Benedikt


[1] 
https://www.reddit.com/r/linux/comments/18kvdb/xfsaild_is_creating_tons_of_system_threads_and/
[2] 
http://serverfault.com/questions/497049/the-xfs-filesystem-is-broken-in-rhel-centos-6-x-what-can-i-do-about-it
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Benedikt Fraunhofer
Hi Jan,

we initially had to bump it once we had more than 12 osds
per box. But it'll change that to the values you provided.

Thx!

 Benedikt

2015-12-08 8:15 GMT+01:00 Jan Schermer <j...@schermer.cz>:
> What is the setting of sysctl kernel.pid_max?
> You relly need to have this:
> kernel.pid_max = 4194304
> (I think it also sets this as well: kernel.threads-max = 4194304)
>
> I think you are running out of processs IDs.
>
> Jan
>
>> On 08 Dec 2015, at 08:10, Benedikt Fraunhofer <fraunho...@traced.net> wrote:
>>
>> Hello Cephers,
>>
>> lately, our ceph-cluster started to show some weird behavior:
>>
>> the osd boxes show a load of 5000-15000 before the osds get marked down.
>> Usually the box is fully usable, even "apt-get dist-upgrade" runs smoothly,
>> you can read and write to any disk, only things you can't do are strace the 
>> osd
>> processes, sync or reboot.
>>
>> we only find some logs about the "xfsaild = XFS Access Item List Daemon"
>> as hung_task warnings.
>>
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016108]
>> [] ? kthread_create_on_node+0x1c0/0x1c0
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016112] INFO: task
>> xfsaild/dm-1:1445 blocked for more than 120 seconds.
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016329]   Tainted:
>> G C 3.19.0-39-generic #44~14.04.1-Ubuntu
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016558] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016802] xfsaild/dm-1
>> D 8807faa03af8 0  1445  2 0x
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016805]
>> 8807faa03af8 8808098989d0 00013e80 8807faa03fd8
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016808]
>> 00013e80 88080bb775c0 8808098989d0 88011381b2a8
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016812]
>> 8807faa03c50 7fff 8807faa03c48 8808098989d0
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016815] Call Trace:
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016819]
>> [] schedule+0x29/0x70
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016823]
>> [] schedule_timeout+0x20c/0x280
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016826]
>> [] ? sched_clock_cpu+0x85/0xc0
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016830]
>> [] ? try_to_wake_up+0x1f1/0x340
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016834]
>> [] wait_for_completion+0xa4/0x170
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016836]
>> [] ? wake_up_state+0x20/0x20
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016840]
>> [] flush_work+0xed/0x1c0
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016846]
>> [] ? destroy_worker+0x90/0x90
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016870]
>> [] xlog_cil_force_lsn+0x7e/0x1f0 [xfs]
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016873]
>> [] ? lock_timer_base.isra.36+0x2b/0x50
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016878]
>> [] ? try_to_del_timer_sync+0x4f/0x70
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016901]
>> [] _xfs_log_force+0x60/0x270 [xfs]
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016904]
>> [] ? internal_add_timer+0x80/0x80
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016926]
>> [] xfs_log_force+0x2a/0x90 [xfs]
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016948]
>> [] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016970]
>> [] xfsaild+0x140/0x5a0 [xfs]
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016992]
>> [] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016996]
>> [] kthread+0xd2/0xf0
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.017000]
>> [] ? kthread_create_on_node+0x1c0/0x1c0
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.017005]
>> [] ret_from_fork+0x58/0x90
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.017009]
>> [] ? kthread_create_on_node+0x1c0/0x1c0
>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.017013] INFO: task
>> xfsaild/dm-6:1616 blocked for more than 120 seconds.
>>
>> kswapd is also reported as hung, but we don't have swap on the osds.
>>
>> It looks like either all ceph-osd-threads are reporting in as willing to 
>> work,
>> or it's the xfs-maintenance-process itself like described in [1,2]
>>
>> Usually i

Re: [ceph-users] after loss of journal, osd fails to start with failed assert OSDMapRef OSDService::get_map(epoch_t) ret != null

2015-12-07 Thread Benedikt Fraunhofer
Hi Jan,

2015-12-08 8:12 GMT+01:00 Jan Schermer :

> Journal doesn't just "vanish", though, so you should investigate further...

We tried putting journals as files to overcome the changes in ceph-deploy
where you can't have the journals unencrypted but only the disks itself.
(and/or you can't have the journals on an msdos-fdisk-thing disk, just gpt, but
the debian installier can't handle gpt)
(this worked when we started but was changed later)

After a crash [1] this file just wasn't there any longer.

> This log is from the new empty journal, right?

Yep.

We're slowly migrating away from the journal-as-file deployment;
I just thought that it should be able to start up with an empty
journal without dying with an assertion-failure.

Thx in advance

  Benedikt

[1] 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-December/006593.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Benedikt Fraunhofer
Hi Jan,

we had 65k for pid_max, which made
kernel.threads-max = 1030520.
or
kernel.threads-max = 256832
(looks like it depends on the number of cpus?)

currently we've

root@ceph1-store209:~# sysctl -a | grep -e thread -e pid
kernel.cad_pid = 1
kernel.core_uses_pid = 0
kernel.ns_last_pid = 60298
kernel.pid_max = 65535
kernel.threads-max = 256832
vm.nr_pdflush_threads = 0
root@ceph1-store209:~# ps axH |wc -l
17548

we'll see how it behaves once puppet has come by and adjusted it.

Thx!

 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd dies on pg repair with FAILED assert(!out->snaps.empty())

2015-12-07 Thread Benedikt Fraunhofer
Hello Cephers!

trying to repair an inconsistent PG results in the osd dying with an
assertion failure:

 0> 2015-12-01 07:22:13.398006 7f76d6594700 -1 osd/SnapMapper.cc:
In function 'int SnapMapper::get_snaps(const hobject_t&
, SnapMapper::object_snaps*)' thread 7f76d6594700 time 2015-12-01
07:22:13.394900
osd/SnapMapper.cc: 153: FAILED assert(!out->snaps.empty())

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xbc60eb]
 2: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x40c) [0x72aecc]
 3: (SnapMapper::get_snaps(hobject_t const&, std::set*)+0xa2) [0x72
b062]
 4: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84]
 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8]
 6: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0]
 7: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e]
 8: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 10: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 11: (()+0x8182) [0x7f76fe072182]
 12: (clone()+0x6d) [0x7f76fc5dd47d]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-osd.339.log
--- end dump of recent events ---
2015-12-01 07:22:13.476525 7f76d6594700 -1 *** Caught signal (Aborted) **
 in thread 7f76d6594700

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7f76fe07a340]
 3: (gsignal()+0x39) [0x7f76fc519cc9]
 4: (abort()+0x148) [0x7f76fc51d0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f76fce24535]
 6: (()+0x5e6d6) [0x7f76fce226d6]
 7: (()+0x5e703) [0x7f76fce22703]
 8: (()+0x5e922) [0x7f76fce22922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x278) [0xbc62d8]
 10: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x40c) [0x72aecc]
 11: (SnapMapper::get_snaps(hobject_t const&, std::set*)+0xa2) [0x72b062]
 12: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84]
 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8]
 14: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0]
 15: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e]
 16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 18: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 19: (()+0x8182) [0x7f76fe072182]
 20: (clone()+0x6d) [0x7f76fc5dd47d]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

--- begin dump of recent events ---
-4> 2015-12-01 07:22:13.403280 7f76e4db1700  1 --
10.9.246.104:6887/8548 <== osd.109 10.9.245.204:0/3407 13 
osd_ping(ping e320057 stamp 2015-12-01 07:22:13.399779) v2  47+0+0
(1340520147 0 0) 0x22456800 con 0x22340b00
-3> 2015-12-01 07:22:13.403313 7f76e4db1700  1 --
10.9.246.104:6887/8548 --> 10.9.245.204:0/3407 -- osd_ping(ping_reply
e320057 stamp 2015-12-01 07:22:13.399779) v2 -- ?+0 0x23e3be00 con
0x22340b00
-2> 2015-12-01 07:22:13.403365 7f76e35ae700  1 --
10.9.246.104:6883/8548 <== osd.109 10.9.245.204:0/3407 13 
osd_ping(ping e320057 stamp 2015-12-01 07:22:13.399779) v2  47+0+0
(1340520147 0 0) 0x22457600 con 0x22570d60
-1> 2015-12-01 07:22:13.403405 7f76e35ae700  1 --
10.9.246.104:6883/8548 --> 10.9.245.204:0/3407 -- osd_ping(ping_reply
e320057 stamp 2015-12-01 07:22:13.399779) v2 -- ?+0 0x23e3fe00 con
0x22570d60
 0> 2015-12-01 07:22:13.476525 7f76d6594700 -1 *** Caught signal
(Aborted) **
 in thread 7f76d6594700
 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7f76fe07a340]
 3: (gsignal()+0x39) [0x7f76fc519cc9]
 4: (abort()+0x148) [0x7f76fc51d0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f76fce24535]
 6: (()+0x5e6d6) 

[ceph-users] after loss of journal, osd fails to start with failed assert OSDMapRef OSDService::get_map(epoch_t) ret != null

2015-12-07 Thread Benedikt Fraunhofer
Hello List,

after some crash of a box, the journal vanished. Creating a new one
with --mkjournal results in the osd beeing unable to start.
Does anyone want to dissect this any further or should I just trash
the osd and recreate it?

Thx in advance
  Benedikt

2015-12-01 07:46:31.505255 7fadb7f1e900  0 ceph version 0.94.5
(9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 5486
2015-12-01 07:46:31.628585 7fadb7f1e900  0
filestore(/var/lib/ceph/osd/ceph-328) backend xfs (magic 0x58465342)
2015-12-01 07:46:31.662972 7fadb7f1e900  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-328) detect_features:
FIEMAP ioctl is supported and appears to work
2015-12-01 07:46:31.662984 7fadb7f1e900  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-328) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-12-01 07:46:31.674999 7fadb7f1e900  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-328) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2015-12-01 07:46:31.675071 7fadb7f1e900  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-328) detect_feature:
extsize is supported and kernel 3.19.0-33-generic >= 3.5
2015-12-01 07:46:31.806490 7fadb7f1e900  0
filestore(/var/lib/ceph/osd/ceph-328) mount: enabling WRITEAHEAD
journal mode: checkpoint is not enabled
2015-12-01 07:46:35.598698 7fadb7f1e900  1 journal _open
/var/lib/ceph/osd/ceph-328/journal fd 19: 9663676416 bytes, block size
4096 bytes, directio = 1, aio = 1
2015-12-01 07:46:35.600956 7fadb7f1e900  1 journal _open
/var/lib/ceph/osd/ceph-328/journal fd 19: 9663676416 bytes, block size
4096 bytes, directio = 1, aio = 1
2015-12-01 07:46:35.619860 7fadb7f1e900  0 
cls/hello/cls_hello.cc:271: loading cls_hello
2015-12-01 07:46:35.682532 7fadb7f1e900 -1 osd/OSD.h: In function
'OSDMapRef OSDService::get_map(epoch_t)' thread 7fadb7f1e900 time
2015-12-01 07:46:35.681204
osd/OSD.h: 716: FAILED assert(ret)

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xbc60eb]
 2: (OSDService::get_map(unsigned int)+0x3f) [0x70ad5f]
 3: (OSD::init()+0x6ad) [0x6c5e0d]
 4: (main()+0x2860) [0x6527e0]
 5: (__libc_start_main()+0xf5) [0x7fadb505bec5]
 6: /usr/bin/ceph-osd() [0x66b887]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

--- begin dump of recent events ---
   -62> 2015-12-01 07:46:31.503728 7fadb7f1e900  5 asok(0x5402000)
register_command perfcounters_dump hook 0x53a2050
   -61> 2015-12-01 07:46:31.503759 7fadb7f1e900  5 asok(0x5402000)
register_command 1 hook 0x53a2050
   -60> 2015-12-01 07:46:31.503764 7fadb7f1e900  5 asok(0x5402000)
register_command perf dump hook 0x53a2050
   -59> 2015-12-01 07:46:31.503768 7fadb7f1e900  5 asok(0x5402000)
register_command perfcounters_schema hook 0x53a2050
   -58> 2015-12-01 07:46:31.503772 7fadb7f1e900  5 asok(0x5402000)
register_command 2 hook 0x53a2050
   -57> 2015-12-01 07:46:31.503775 7fadb7f1e900  5 asok(0x5402000)
register_command perf schema hook 0x53a2050
   -56> 2015-12-01 07:46:31.503786 7fadb7f1e900  5 asok(0x5402000)
register_command perf reset hook 0x53a2050
   -55> 2015-12-01 07:46:31.503790 7fadb7f1e900  5 asok(0x5402000)
register_command config show hook 0x53a2050
   -54> 2015-12-01 07:46:31.503792 7fadb7f1e900  5 asok(0x5402000)
register_command config set hook 0x53a2050
   -53> 2015-12-01 07:46:31.503797 7fadb7f1e900  5 asok(0x5402000)
register_command config get hook 0x53a2050
   -52> 2015-12-01 07:46:31.503799 7fadb7f1e900  5 asok(0x5402000)
register_command config diff hook 0x53a2050
   -51> 2015-12-01 07:46:31.503802 7fadb7f1e900  5 asok(0x5402000)
register_command log flush hook 0x53a2050
   -50> 2015-12-01 07:46:31.503804 7fadb7f1e900  5 asok(0x5402000)
register_command log dump hook 0x53a2050
   -49> 2015-12-01 07:46:31.503807 7fadb7f1e900  5 asok(0x5402000)
register_command log reopen hook 0x53a2050
   -48> 2015-12-01 07:46:31.505255 7fadb7f1e900  0 ceph version 0.94.5
(9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 5486
   -47> 2015-12-01 07:46:31.619430 7fadb7f1e900  1 -- 10.9.246.104:0/0
learned my addr 10.9.246.104:0/0
   -46> 2015-12-01 07:46:31.619439 7fadb7f1e900  1
accepter.accepter.bind my_inst.addr is 10.9.246.104:6821/5486
need_addr=0
   -45> 2015-12-01 07:46:31.619457 7fadb7f1e900  1
accepter.accepter.bind my_inst.addr is 0.0.0.0:6824/5486 need_addr=1
   -44> 2015-12-01 07:46:31.619473 7fadb7f1e900  1
accepter.accepter.bind my_inst.addr is 0.0.0.0:6825/5486 need_addr=1
   -43> 2015-12-01 07:46:31.619492 7fadb7f1e900  1 -- 10.9.246.104:0/0
learned my addr 10.9.246.104:0/0
   -42> 2015-12-01 07:46:31.619496 7fadb7f1e900  1
accepter.accepter.bind my_inst.addr is 10.9.246.104:6827/5486
need_addr=0
   -41> 2015-12-01 07:46:31.620890 7fadb7f1e900  5 asok(0x5402000)
init /var/run/ceph/ceph-osd.328.asok
   -40> 2015-12-01 07:46:31.620901 7fadb7f1e900  5 asok(0x5402000)
bind_and_listen 

Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-07 Thread Benedikt Fraunhofer
Hi Jan,

> Doesn't look near the limit currently (but I suppose you rebooted it in the 
> meantime?).

the box this numbers came from has an uptime of 13 days
so it's one of the boxes that did survive yesterdays half-cluster-wide-reboot.

> Did iostat say anything about the drives? (btw dm-1 and dm-6 are what? Is 
> that your data drives?) - were they overloaded really?

no they didn't have any load and or iops.
Basically the whole box had nothing to do.

If I understand the load correctly, this just reports threads
that are ready and willing to work but - in this case -
don't get any data to work with.

Thx

 Benedikt


2015-12-08 8:44 GMT+01:00 Jan Schermer <j...@schermer.cz>:
>
> Jan
>
>
>> On 08 Dec 2015, at 08:41, Benedikt Fraunhofer <fraunho...@traced.net> wrote:
>>
>> Hi Jan,
>>
>> we had 65k for pid_max, which made
>> kernel.threads-max = 1030520.
>> or
>> kernel.threads-max = 256832
>> (looks like it depends on the number of cpus?)
>>
>> currently we've
>>
>> root@ceph1-store209:~# sysctl -a | grep -e thread -e pid
>> kernel.cad_pid = 1
>> kernel.core_uses_pid = 0
>> kernel.ns_last_pid = 60298
>> kernel.pid_max = 65535
>> kernel.threads-max = 256832
>> vm.nr_pdflush_threads = 0
>> root@ceph1-store209:~# ps axH |wc -l
>> 17548
>>
>> we'll see how it behaves once puppet has come by and adjusted it.
>>
>> Thx!
>>
>> Benedikt
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to improve single thread sequential reads?

2015-08-18 Thread Benedikt Fraunhofer
Hi Nick,

did you do anything fancy to get to ~90MB/s in the first place?
I'm stuck at ~30MB/s reading cold data. single-threaded-writes are
quite speedy, around 600MB/s.

radosgw for cold data is around the 90MB/s, which is imho limitted by
the speed of a single disk.

Data already present on the osd-os-buffers arrive with around
400-700MB/s so I don't think the network is the culprit.

(20 node cluster, 12x4TB 7.2k disks, 2 ssds for journals for 6 osds
each, lacp 2x10g bonds)

rados bench single-threaded performs equally bad, but with its default
multithreaded settings it generates wonderful numbers, usually only
limiited by linerate and/or interrupts/s.

I just gave kernel 4.0 with its rbd-blk-mq feature a shot, hoping to
get to your wonderful numbers, but it's staying below 30 MB/s.

I was thinking about using a software raid0 like you did but that's
imho really ugly.
When I know I needed something speedy, I usually just started dd-ing
the file to /dev/null and wait for about  three minutes before
starting the actual job; some sort of hand-made read-ahead for
dummies.

Thx in advance
  Benedikt


2015-08-17 13:29 GMT+02:00 Nick Fisk n...@fisk.me.uk:
 Thanks for the replies guys.

 The client is set to 4MB, I haven't played with the OSD side yet as I wasn't
 sure if it would make much difference, but I will give it a go. If the
 client is already passing a 4MB request down through to the OSD, will it be
 able to readahead any further? The next 4MB object in theory will be on
 another OSD and so I'm not sure if reading ahead any further on the OSD side
 would help.

 How I see the problem is that the RBD client will only read 1 OSD at a time
 as the RBD readahead can't be set any higher than max_hw_sectors_kb, which
 is the object size of the RBD. Please correct me if I'm wrong on this.

 If you could set the RBD readahead to much higher than the object size, then
 this would probably give the desired effect where the buffer could be
 populated by reading from several OSD's in advance to give much higher
 performance. That or wait for striping to appear in the Kernel client.

 I've also found that BareOS (fork of Bacula) seems to has a direct RADOS
 feature that supports radosstriper. I might try this and see how it performs
 as well.


 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Somnath Roy
 Sent: 17 August 2015 03:36
 To: Alex Gorbachev a...@iss-integration.com; Nick Fisk n...@fisk.me.uk
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] How to improve single thread sequential reads?

 Have you tried setting read_ahead_kb to bigger number for both client/OSD
 side if you are using krbd ?
 In case of librbd, try the different config options for rbd cache..

 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Alex Gorbachev
 Sent: Sunday, August 16, 2015 7:07 PM
 To: Nick Fisk
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] How to improve single thread sequential reads?

 Hi Nick,

 On Thu, Aug 13, 2015 at 4:37 PM, Nick Fisk n...@fisk.me.uk wrote:
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
  Of Nick Fisk
  Sent: 13 August 2015 18:04
  To: ceph-users@lists.ceph.com
  Subject: [ceph-users] How to improve single thread sequential reads?
 
  Hi,
 
  I'm trying to use a RBD to act as a staging area for some data before
  pushing
  it down to some LTO6 tapes. As I cannot use striping with the kernel
  client I
  tend to be maxing out at around 80MB/s reads testing with DD. Has
  anyone got any clever suggestions of giving this a bit of a boost, I
  think I need
  to get it
  up to around 200MB/s to make sure there is always a steady flow of
  data to the tape drive.
 
  I've just tried the testing kernel with the blk-mq fixes in it for
  full size IO's, this combined with bumping readahead up to 4MB, is now
  getting me on average 150MB/s to 200MB/s so this might suffice.
 
  On a personal interest, I would still like to know if anyone has ideas
  on how to really push much higher bandwidth through a RBD.

 Some settings in our ceph.conf that may help:

 osd_op_threads = 20
 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k
 filestore_queue_max_ops = 9 filestore_flusher = false
 filestore_max_sync_interval = 10 filestore_sync_flush = false

 Regards,
 Alex

 
 
  Rbd-fuse seems to top out at 12MB/s, so there goes that option.
 
  I'm thinking mapping multiple RBD's and then combining them into a
  mdadm
  RAID0 stripe might work, but seems a bit messy.
 
  Any suggestions?
 
  Thanks,
  Nick
 
 
 
 
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 

Re: [ceph-users] How to calculate file size when mount a block device from rbd image

2014-10-20 Thread Benedikt Fraunhofer
Hi Mika,

2014-10-20 11:16 GMT+02:00 Vickie CH mika.leaf...@gmail.com:

 2.Use dd command to create a 1.2T file.
#dd if=/dev/zero of=/mnt/ceph-mount/test12T bs=1M count=12288000

I think you're off by one zero

12288000/1024/1024
11

Means you're instructing it to create a 11TB file on a 1.5T volume.

Cheers

  Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 2 pgs stuck in active+clean+inconsistent

2014-06-06 Thread Benedikt Fraunhofer
Hello List,

the other day when i looked at our ceph cluster it showed:

 health HEALTH_ERR 135 pgs inconsistent; 1 pgs recovering;
recovery 76/4633296 objects degraded (0.002%); 169 scrub errors; clock
skew detected on mon.mon2-nb8

I did a

 ceph pg dump  | grep -i incons | cut -f 1 | while read a; do ceph pg
repair $a  done

to get rid of most of these, but 2 remained; over night it scrubbed (i
think) and raised it to 3:

2014-06-06 03:23:53.462918 mon.0 [INF] pgmap v2623164: 10640 pgs:
10638 active+clean, 2 active+clean+inconsistent; 5657 GB data, 17068
GB used, 332 TB / 349 TB avail
2014-06-06 03:22:06.209085 osd.90 [INF] 27.58 scrub ok
2014-06-06 03:22:17.251617 osd.32 [ERR] 2.126 shard 12: soid
ec653126/rb.0.11d90.238e1f29.083e/head//2 digest 1668941108 !=
known digest 3542109454
2014-06-06 03:22:17.251929 osd.32 [ERR] 2.126 deep-scrub 0 missing, 1
inconsistent objects
2014-06-06 03:22:17.251994 osd.32 [ERR] 2.126 deep-scrub 1 errors
2014-06-06 03:23:54.471206 mon.0 [INF] pgmap v2623165: 10640 pgs:
10637 active+clean, 2 active+clean+inconsistent, 1
active+clean+scrubbing; 5657 GB data, 17068 GB used, 332 TB / 349 TB
avail

the osd hosts have the same uptime and unfortunately the logrotate
deleted the logs before that initially showed up.

I only found a post about mismatched sizes and how to fix that with
--truncate, not digests.

The host holding osd.32 is happy in its dmesg and smart looks fine to
me for this disk.

the current state of the cluster is

 health HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; clock skew
detected on mon.mon1-nb8, mon.mon2-nb8

and it logs nothing in ceph -w when i issue

ceph pg repair 2.c1
 instructing pg 2.c1 on osd.51 to repair
ceph pg repair 2.68
 instructing pg 2.68 on osd.69 to repair

Could you help me troubleshoot that?

Thx
  Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 2 pgs stuck in active+clean+inconsistent

2014-06-06 Thread Benedikt Fraunhofer
2014-06-06 9:18 GMT+02:00 Benedikt Fraunhofer
given.to.lists.ceph-users.ceph.com.toasta@traced.net:
Hello List,

 and it logs nothing in ceph -w when i issue

 ceph pg repair 2.c1
  instructing pg 2.c1 on osd.51 to repair
 ceph pg repair 2.68
  instructing pg 2.68 on osd.69 to repair

Rebooting the hosts holding those osds made them cooperative,
accepting the command and the warning go away.

I guess a restart of the osd-daemons would've been enough, i just was
too lazy to figure out how to stop one specific osd and there were
some updates pending :)

This is ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)

 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw multipart-uploaded downloads fail

2014-04-04 Thread Benedikt Fraunhofer
2014-04-04 0:31 GMT+02:00 Yehuda Sadeh yeh...@inktank.com:
Hi Yehuda,

sorry for the delay. We ran into another problem and this took up all the time.

 Are you running the version off the master branch, or did you just
 cherry-pick the patch? I can't seem to reproduce the problem.

I just patched that line in and gave it a try. Besides my attempts to
get the civetweb thing cooperating, my tree should be at the revision
that was tagged with v0.78.

The file vanished after restarting radosgw, so i took it as a
race-condition or as a result of a wrong upload. Happy to hear you
could sort that one out as well.

I've another problem, this time caused by rather large-ish files:
The final part returned 416 (InvalidRange)

/g969eed92-047e-41c9-a49b-234671afae18_d44abd -
s3://7aecc33d-d3c7-4538-bb59-c0717c06aad9/969eed92-047e-41c9-a49b-234671afae18_d44abd
 [part 67372 of 67373, 15MB]
 15728640 of 15728640   100% in0s21.64 MB/s  done
/g969eed92-047e-41c9-a49b-234671afae18_d44abd -
s3://7aecc33d-d3c7-4538-bb59-c0717c06aad9/969eed92-047e-41c9-a49b-234671afae18_d44abd
 [part 67373 of 67373, 4MB]
 5044856 of 5044856   100% in0s20.04 MB/s  done
ERROR: S3 error: 416 (InvalidRange):

The file was around 986 gigabytes in size and was splittet into 67372
15mb parts and one remaining 4mb part.

I'm currently trying to find a faster box with enough free space to
reproduce that and capture logs.

Thx
 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw multipart-uploaded downloads fail

2014-04-02 Thread Benedikt Fraunhofer
Hi Yehuda,

i tried your patch and it feels fine,
except you might need some special handling for those already corrupt uploads,
as trying to delete them gets radosgw in an endless loop and high cpu usage:

2014-04-02 11:03:15.045627 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045628 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045629 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045631 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045632 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045634 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045634 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045636 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045637 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045639 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045639 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045641 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045642 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045644 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045644 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045646 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045647 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045649 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045649 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045651 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045652 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045654 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045654 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045656 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045657 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045659 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045660 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045661 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045662 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045664 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045665 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045667 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045667 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045669 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045670 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045672 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1


Thx

 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw multipart-uploaded downloads fail

2014-04-01 Thread Benedikt Fraunhofer
Hello everyone,

I can't download anything that's been uploaded as a multipart-upload.
I'm on 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
on a non-ec-pool.

The upload is acknowledged as beeing ok

 2014-03-31 14:56:56.722727 7f4080ff9700  2 req 8:0.023285:s3:POST
/file1:complete_multipart:http status=200
 2014-03-31 14:56:56.722736 7f4080ff9700  1 == req done
req=0x7f407800b180 http_status=200 ==


, but on download, some error is translated to 404.

2014-03-31 14:56:57.885610 7f408cef8700 20 rados-get_obj_iterate_cb
oid=default.71025.1__multipart_file1.2/F9cdVNlUTdfYq7M9cTuQjYwbsS41Y5j.16
obj-ofs=61865984 read_ofs=0 len=4194304
2014-03-31 14:56:57.885635 7f408cef8700 20 rados-aio_operate r=0 bl.length=0
2014-03-31 14:56:57.885639 7f408cef8700 20
RGWObjManifest::operator++(): rule-part_size=4194304 rules.size()=2
2014-03-31 14:56:57.885640 7f408cef8700 20
RGWObjManifest::operator++(): stripe_ofs=66060288 part_ofs=61865984
rule-part_size=4194304
2014-03-31 14:56:57.885642 7f408cef8700  0
RGWObjManifest::operator++(): result: ofs=66060288 stripe_ofs=66060288
part_ofs=66060288 rule-part_size=4194304
2014-03-31 14:56:57.885737 7f40f27fc700 20 get_obj_aio_completion_cb:
io completion ofs=49283072 len=4194304
2014-03-31 14:56:57.885746 7f408cef8700 20 rados-get_obj_iterate_cb
oid=default.71025.1__multipart_file1.2/F9cdVNlUTdfYq7M9cTuQjYwbsS41Y5j.17
obj-ofs=66060288 read_ofs=0 len=1048576
2014-03-31 14:56:57.885770 7f408cef8700 20 rados-aio_operate r=0 bl.length=0
2014-03-31 14:56:57.885772 7f408cef8700 20
RGWObjManifest::operator++(): rule-part_size=4194304 rules.size()=2
2014-03-31 14:56:57.885773 7f408cef8700 20
RGWObjManifest::operator++(): stripe_ofs=70254592 part_ofs=66060288
rule-part_size=4194304
2014-03-31 14:56:57.885774 7f408cef8700  0
RGWObjManifest::operator++(): result: ofs=67108864 stripe_ofs=67108864
part_ofs=70254592 rule-part_size=4194304
2014-03-31 14:56:57.885786 7f408cef8700 10 get_obj_iterate() r=-2,
canceling all io
2014-03-31 14:56:57.885787 7f408cef8700 20 get_obj_data::cancel_all_io()
2014-03-31 14:56:57.885822 7f408cef8700  2 req 9:0.081262:s3:GET
/file1:get_obj:http status=404
2014-03-31 14:56:57.885837 7f408cef8700  1 == req done
req=0x7f4078011510 http_status=404 ==
2014-03-31 14:56:57.885839 7f408cef8700 20 process_request() returned -2
2014-03-31 14:56:57.885903 7f40f27fc700 20 get_obj_aio_completion_cb:
io completion ofs=53477376 len=4194304
2014-03-31 14:56:57.885958 7f40f27fc700 20 get_obj_aio_completion_cb:
io completion ofs=57671680 len=4194304
2014-03-31 14:56:57.886566 7f40f27fc700 20 get_obj_aio_completion_cb:
io completion ofs=66060288 len=1048576
2014-03-31 14:56:57.886636 7f40f27fc700 20 get_obj_aio_completion_cb:
io completion ofs=61865984 len=4194304

everything is working fine, if i disable the multipart-upload;

here's the script I used to test it:
---
#! /bin/bash

set -x

FN=file1
SIZE=64
S3CMD=s3cmd
S3CMD=/home/bf/s3cmd-1.5.0-beta1/s3cmd
MULTIPART=--disable-multipart
MULTIPART=

BUCK=s3://$(uuidgen)/
dd if=/dev/zero of=$FN bs=4M count=$(($SIZE/4)) 
$S3CMD  mb $BUCK
wait
$S3CMD  put $MULTIPART $FN $BUCK
sleep 1
$S3CMD  get --force ${BUCK}${FN} file1.1
---

Is this s3cmd's fault or is radosgw doing something wrong?

Thx in advance

 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw multipart-uploaded downloads fail

2014-04-01 Thread Benedikt Fraunhofer
Hi Yehuda,

 2014-04-01 15:49 GMT+02:00 Yehuda Sadeh yeh...@inktank.com:

 It could be the gateway's fault, might be related to the new manifest
 that went in before 0.78. I'll need more logs though, can you
 reproduce with 'debug ms = 1', and 'debug rgw = 20', and provide a log
 for all the upload and for the download?

that's what i thought, too, at least it it talks about the manifest in the logs.
I've attached one with debug rgw 20 but without ms 1, maybe that's
already of help.
If you need the ms 1, I can generate that tomorrow.

Thx

 Benedikt


radosgw11.log.gz
Description: GNU Zip compressed data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com