would this behavior go away if I add more osds or pg(s), or can I do anything 
else besides to change the FS on osds? is this a known performance issue?

Thanks
--
Dan

On December 22, 2015 4:53:24 PM Wade Holler <[email protected]> wrote:

The hanging kernel tasks under -327 for XFS resulted in LOG verification 
failures and completely locked the hosts.
BTRFS task timeouts we could get around by setting 
kernel.hung_task_timeout_secs = 960

The host would eventually get responsive again however that doesn't really 
matter, since the ceph ops are blocked for so long it all goes to hell anyways.
I only found stability under high load with EXT4 or -229 with BTRFS|EXT4.

Bad story, sorry to have to tell it.

-Wade


On Tue, Dec 22, 2015 at 9:44 AM Dan Nica 
<[email protected]<mailto:[email protected]>> wrote:
That is strange, maybe there is a sysctl option to tweak on OSDs ? this will be 
nasty if it goes into our production!

--
Dan

From: Wade Holler [mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, December 22, 2015 4:36 PM
To: Dan Nica 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: Re: [ceph-users] requests are blocked

I had major host stability problems under load with -327  . Repeatable test 
cases under high load with XFS or BTRFS would result in hung kernel tasks and 
of course the sympathetic behavior you mention.
requests are blocked mean that the op tracker in ceph hasn't received a timely 
response from the osd usually.  I'm sure someone more seasoned can provide a 
better explanation.
-Wade

On Tue, Dec 22, 2015 at 9:24 AM Dan Nica 
<[email protected]<mailto:[email protected]>> wrote:
Hi

I try to run a bench test on a RBD image and I get from time to time the 
following in ceph status

    cluster 046b0180-dc3f-4846-924f-41d9729d48c8
     health HEALTH_WARN
            2 requests are blocked > 32 sec
     monmap e1: 3 mons at 
{alder=10.6.250.249:6789/0,ash=10.6.250.248:6789/0,aspen=10.6.250.247:6789/0<http://10.6.250.249:6789/0,ash=10.6.250.248:6789/0,aspen=10.6.250.247:6789/0>}
            election epoch 18, quorum 0,1,2 aspen,ash,alder
     osdmap e114: 6 osds: 6 up, 6 in
            flags sortbitwise
      pgmap v3816: 192 pgs, 1 pools, 23062 MB data, 5814 objects
            46406 MB used, 44624 GB / 44670 GB avail
                 192 active+clean
  client io 6083 B/s rd, 18884 kB/s wr, 75 op/s


what does  "requests are blocked" mean ? and performance drops to almost  0 ?
I am running infernalis version on Centos 7 kernel 3.10.0-327.3.1.el7.x86_64

Thanks
--
Dan
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to