-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
What are your iptable rules?
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Thu, Sep 17, 2015 at 1:01 AM, Stefan Eriksson wrote:
> hi here is the info, I have added "ceph osd pool set rbd pg_num 128" but that
> locks up aswell it seems.
>
> Here are the details your after:
>
> [cephcluster@ceph01-adm01 ceph-deploy]$ ceph osd pool ls detail
> pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 128 pgp_num 64 last_change 37 flags hashpspool stripe_width 0
>
> [cephcluster@ceph01-adm01 ceph-deploy]$ ceph pg dump_stuck
> ok
> pg_stat state up up_primary acting acting_primary
> 0.2d stale+undersized+degraded+peered [0] 0 [0] 0
> 0.2c stale+undersized+degraded+peered [0] 0 [0] 0
> 0.2b stale+undersized+degraded+peered [0] 0 [0] 0
> 0.2a stale+undersized+degraded+peered [0] 0 [0] 0
> 0.29 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.28 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.27 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.26 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.25 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.24 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.23 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.22 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.21 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.20 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.1f stale+undersized+degraded+peered [0] 0 [0] 0
> 0.1e stale+undersized+degraded+peered [0] 0 [0] 0
> 0.1d stale+undersized+degraded+peered [0] 0 [0] 0
> 0.1c stale+undersized+degraded+peered [0] 0 [0] 0
> 0.1b stale+undersized+degraded+peered [0] 0 [0] 0
> 0.1a stale+undersized+degraded+peered [0] 0 [0] 0
> 0.19 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.18 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.17 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.16 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.15 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.14 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.13 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.12 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.11 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.10 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.f stale+undersized+degraded+peered [0] 0 [0] 0
> 0.e stale+undersized+degraded+peered [0] 0 [0] 0
> 0.d stale+undersized+degraded+peered [0] 0 [0] 0
> 0.c stale+undersized+degraded+peered [0] 0 [0] 0
> 0.b stale+undersized+degraded+peered [0] 0 [0] 0
> 0.a stale+undersized+degraded+peered [0] 0 [0] 0
> 0.9 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.8 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.7 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.6 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.5 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.4 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.3 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.2 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.1 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.0 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.7f creating [0,2,1] 0 [0,2,1] 0
> 0.7e creating [2,0,1] 2 [2,0,1] 2
> 0.7d creating [0,2,1] 0 [0,2,1] 0
> 0.7c creating [1,0,2] 1 [1,0,2] 1
> 0.7b creating [0,2,1] 0 [0,2,1] 0
> 0.7a creating [0,2,1] 0 [0,2,1] 0
> 0.79 creating [1,0,2] 1 [1,0,2] 1
> 0.78 creating [1,0,2] 1 [1,0,2] 1
> 0.77 creating [1,0,2] 1 [1,0,2] 1
> 0.76 creating [1,2,0] 1 [1,2,0] 1
> 0.75 creating [1,2,0] 1 [1,2,0] 1
> 0.74 creating [1,2,0] 1 [1,2,0] 1
> 0.73 creating [1,2,0] 1 [1,2,0] 1
> 0.72 creating [0,2,1] 0 [0,2,1] 0
> 0.71 creating [0,2,1] 0 [0,2,1] 0
> 0.70 creating [2,0,1] 2 [2,0,1] 2
> 0.6f creating [2,1,0] 2 [2,1,0] 2
> 0.6e creating [0,1,2] 0 [0,1,2] 0
> 0.6d creating [1,2,0] 1 [1,2,0] 1
> 0.6c creating [2,0,1] 2 [2,0,1] 2
> 0.6b creating [1,2,0] 1 [1,2,0] 1
> 0.6a creating [2,1,0] 2 [2,1,0] 2
> 0.69 creating [2,0,1] 2 [2,0,1] 2
> 0.68 creating [0,1,2] 0 [0,1,2] 0
> 0.67 creating [0,1,2] 0 [0,1,2] 0
> 0.66 creating [0,1,2] 0 [0,1,2] 0
> 0.65 creating [1,0,2] 1 [1,0,2] 1
> 0.64 creating [2,0,1] 2 [2,0,1] 2
> 0.63 creating [1,2,0] 1 [1,2,0] 1
> 0.62 creating [2,1,0] 2 [2,1,0] 2
> 0.61 creating [1,2,0] 1 [1,2,0] 1
> 0.60 creating [1,0,2] 1 [1,0,2] 1
> 0.5f creating [2,0,1] 2 [2,0,1] 2
> 0.5e creating [1,0,2] 1 [1,0,2] 1
> 0.5d creating [1,0,2] 1 [1,0,2] 1
> 0.5c creating [1,2,0] 1 [1,2,0] 1
> 0.5b creating [1,2,0] 1 [1,2,0] 1
> 0.5a creating [1,0,2] 1 [1,0,2] 1
> 0.59 creating [0,2,1] 0 [0,2,1] 0
> 0.58 creating [2,0,1] 2 [2,0,1] 2
> 0.57 creating [0,1,2] 0 [0,1,2] 0
> 0.56 creating [2,1,0] 2 [2,1,0] 2
> 0.55 creating [0,2,1] 0 [0,2,1] 0
> 0.54 creating [0,2,1] 0 [0,2,1] 0
> 0.53 creating [1,2,0] 1 [1,2,0] 1
> 0.52 creating [1,2,0] 1 [1,2,0] 1
> 0.51 creating [1,2,0] 1 [1,2,0] 1
> 0.50 creating [0,2,1] 0 [0,2,1] 0
> 0.4f creating [0,2,1] 0 [0,2,1] 0
> 0.4e creating [0,1,2] 0 [0,1,2] 0
> 0.4d creating [2,1,0] 2 [2,1,0] 2
> 0.4c creating [1,2,0] 1 [1,2,0] 1
> 0.4b creating [0,1,2] 0 [0,1,2] 0
> 0.4a creating [2,1,0] 2 [2,1,0] 2
> 0.49 creating [0,1,2] 0 [0,1,2] 0
> 0.48 creating [1,2,0] 1 [1,2,0] 1
> 0.47 creating [0,2,1] 0 [0,2,1] 0
> 0.46 creating [0,2,1] 0 [0,2,1] 0
> 0.45 creating [2,0,1] 2 [2,0,1] 2
> 0.44 creating [1,2,0] 1 [1,2,0] 1
> 0.43 creating [1,0,2] 1 [1,0,2] 1
> 0.42 creating [1,0,2] 1 [1,0,2] 1
> 0.41 creating [1,2,0] 1 [1,2,0] 1
> 0.40 creating [0,1,2] 0 [0,1,2] 0
> 0.3f stale+undersized+degraded+peered [0] 0 [0] 0
> 0.3e stale+undersized+degraded+peered [0] 0 [0] 0
> 0.3d stale+undersized+degraded+peered [0] 0 [0] 0
> 0.3c stale+undersized+degraded+peered [0] 0 [0] 0
> 0.3b stale+undersized+degraded+peered [0] 0 [0] 0
> 0.3a stale+undersized+degraded+peered [0] 0 [0] 0
> 0.39 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.38 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.37 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.36 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.35 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.34 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.33 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.32 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.31 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.30 stale+undersized+degraded+peered [0] 0 [0] 0
> 0.2f stale+undersized+degraded+peered [0] 0 [0] 0
> 0.2e stale+undersized+degraded+peered [0] 0 [0] 0
>
> let me know if you need anything else, I try to debug and see a lot of the
> below in my ceph-osd.log
>
>
> 015-09-17 08:57:08.723289 7f2b31eda700 20 heartbeat_map reset_timeout
> 'FileStore::op_tp thread 0x7f2b31eda700' grace 60 suicide 0
> 2015-09-17 08:57:09.302722 7f2b1f549700 20 heartbeat_map reset_timeout
> 'OSD::recovery_tp thread 0x7f2b1f549700' grace 60 suicide 0
> 2015-09-17 08:57:09.508026 7f2b3aa73700 20 heartbeat_map is_healthy = healthy
> 2015-09-17 08:57:09.722922 7f2b33ede700 20
> filestore(/var/lib/ceph/osd/ceph-0) sync_entry woke after 5.000183
> 2015-09-17 08:57:09.722956 7f2b33ede700 20
> filestore(/var/lib/ceph/osd/ceph-0) sync_entry waiting for max_interval
> 5.000000
> 2015-09-17 08:57:09.722987 7f2b326db700 20 heartbeat_map reset_timeout
> 'FileStore::op_tp thread 0x7f2b326db700' grace 60 suicide 0
> 2015-09-17 08:57:09.723080 7f2b31eda700 20 heartbeat_map reset_timeout
> 'FileStore::op_tp thread 0x7f2b31eda700' grace 60 suicide 0
> 2015-09-17 08:57:09.793157 7f2b1fd4a700 20 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f2b1fd4a700' grace 15 suicide 150
> 2015-09-17 08:57:09.793169 7f2b1fd4a700 20 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f2b1fd4a700' grace 4 suicide 0
> 2015-09-17 08:57:09.801912 7f2b2154d700 20 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f2b2154d700' grace 15 suicide 150
> 2015-09-17 08:57:09.801925 7f2b2154d700 20 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f2b2154d700' grace 4 suicide 0
> 2015-09-17 08:57:09.828221 7f2b1e547700 20 heartbeat_map reset_timeout
> 'OSD::command_tp thread 0x7f2b1e547700' grace 60 suicide 0
> 2015-09-17 08:57:09.954625 7f2b24553700 20 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f2b24553700' grace 15 suicide 150
> 2015-09-17 08:57:09.954646 7f2b24553700 20 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f2b24553700' grace 4 suicide 0
> 2015-09-17 08:57:09.989839 7f2b21d4e700 20 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f2b21d4e700' grace 15 suicide 150
> 2015-09-17 08:57:09.989852 7f2b21d4e700 20 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7f2b21d4e700' grace 4 suicide 0
>
>
>
>> 17 sep 2015 kl. 08:11 skrev Goncalo Borges :
>>
>> Hello Stefan...
>>
>> Those 64 PGs refer to the default rbd pool which is created. Can you please
>> give us the output of
>>
>> # ceph osd pool ls detail
>> # ceph pg dump_stuck
>>
>> The degraded / stale status means that the PGs can not be replicated
>> according to your policies.
>>
>> My guess is that you simply have too few OSDs for the number of replicas you
>> are requesting
>>
>> Cheers
>> G.
>>
>>
>>
>> On 09/17/2015 02:59 AM, Stefan Eriksson wrote:
>>> I have a completely new cluster for testing and its three servers which all
>>> are monitors and hosts for OSD, they each have one disk.
>>> The issue is ceph status shows: 64 stale+undersized+degraded+peered
>>>
>>> health:
>>>
>>> health HEALTH_WARN
>>> clock skew detected on mon.ceph01-osd03
>>> 64 pgs degraded
>>> 64 pgs stale
>>> 64 pgs stuck degraded
>>> 64 pgs stuck inactive
>>> 64 pgs stuck stale
>>> 64 pgs stuck unclean
>>> 64 pgs stuck undersized
>>> 64 pgs undersized
>>> too few PGs per OSD (21 < min 30)
>>> Monitor clock skew detected
>>> monmap e1: 3 mons at
>>> {ceph01-osd01=192.1.41.51:6789/0,ceph01-osd02=192.1.41.52:6789/0,ceph01-osd03=192.1.41.53:6789/0}
>>> election epoch 82, quorum 0,1,2
>>> ceph01-osd01,ceph01-osd02,ceph01-osd03
>>> osdmap e36: 3 osds: 3 up, 3 in
>>> pgmap v85: 64 pgs, 1 pools, 0 bytes data, 0 objects
>>> 101352 kB used, 8365 GB / 8365 GB avail
>>> 64 stale+undersized+degraded+peered
>>>
>>>
>>> ceph osd tree shows:
>>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>> -1 8.15996 root default
>>> -2 2.71999 host ceph01-osd01
>>> 0 2.71999 osd.0 up 1.00000 1.00000
>>> -3 2.71999 host ceph01-osd02
>>> 1 2.71999 osd.1 up 1.00000 1.00000
>>> -4 2.71999 host ceph01-osd03
>>> 2 2.71999 osd.2 up 1.00000 1.00000
>>>
>>>
>>>
>>>
>>>
>>> Here is my crushmap:
>>>
>>> # begin crush map
>>> tunable choose_local_tries 0
>>> tunable choose_local_fallback_tries 0
>>> tunable choose_total_tries 50
>>> tunable chooseleaf_descend_once 1
>>> tunable straw_calc_version 1
>>>
>>> # devices
>>> device 0 osd.0
>>> device 1 osd.1
>>> device 2 osd.2
>>>
>>> # types
>>> type 0 osd
>>> type 1 host
>>> type 2 chassis
>>> type 3 rack
>>> type 4 row
>>> type 5 pdu
>>> type 6 pod
>>> type 7 room
>>> type 8 datacenter
>>> type 9 region
>>> type 10 root
>>>
>>> # buckets
>>> host ceph01-osd01 {
>>> id -2 # do not change unnecessarily
>>> # weight 2.720
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.0 weight 2.720
>>> }
>>> host ceph01-osd02 {
>>> id -3 # do not change unnecessarily
>>> # weight 2.720
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.1 weight 2.720
>>> }
>>> host ceph01-osd03 {
>>> id -4 # do not change unnecessarily
>>> # weight 2.720
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.2 weight 2.720
>>> }
>>> root default {
>>> id -1 # do not change unnecessarily
>>> # weight 8.160
>>> alg straw
>>> hash 0 # rjenkins1
>>> item ceph01-osd01 weight 2.720
>>> item ceph01-osd02 weight 2.720
>>> item ceph01-osd03 weight 2.720
>>> }
>>>
>>> # rules
>>> rule replicated_ruleset {
>>> ruleset 0
>>> type replicated
>>> min_size 1
>>> max_size 10
>>> step take default
>>> step chooseleaf firstn 0 type host
>>> step emit
>>> }
>>>
>>> # end crush map
>>>
>>> And the ceph.conf which is shared among all nodes:
>>>
>>> ceph.conf
>>> [global]
>>> fsid = b9043917-5f65-98d5-8624-ee12ff32a5ea
>>> public_network = 192.1.41.0/24
>>> cluster_network = 192.168.0.0/24
>>> mon_initial_members = ceph01-osd01, ceph01-osd02, ceph01-osd03
>>> mon_host = 192.1.41.51,192.1.41.52,192.1.41.53
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> filestore_xattr_use_omap = true
>>> osd pool default pg num = 512
>>> osd pool default pgp num = 512
>>>
>>> Logs doesnt say much, the only active log which adds something is:
>>>
>>> mon.ceph01-osd01@0(leader).data_health(82) update_stats avail 88% total
>>> 9990 MB, used 1170 MB, avail 8819 MB
>>> mon.ceph01-osd02@1(peon).data_health(82) update_stats avail 88% total 9990
>>> MB, used 1171 MB, avail 8818 MB
>>> mon.ceph01-osd03@2(peon).data_health(82) update_stats avail 88% total 9990
>>> MB, used 1172 MB, avail 8817 MB
>>>
>>> Does anyone have a thoughts of what might be wrong? Or if there is other
>>> info I can provide to ease the search for what it might be?
>>>
>>> Thanks!
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> --
>> Goncalo Borges
>> Research Computing
>> ARC Centre of Excellence for Particle Physics at the Terascale
>> School of Physics A28 | University of Sydney, NSW 2006
>> T: +61 2 93511937
>>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.0.2
Comment: https://www.mailvelope.com
wsFcBAEBCAAQBQJV+ttICRDmVDuy+mK58QAAIcUQAJlV3a9tehS4nRil+31Y
YBClyPT51LEVEflSfE4Vpv7EDEjm0cvbJi+zFAOMpUpnj3+2Qu24NLcuZ02k
HN9n3glETNN2Mazp5F0UZEXqv8e8lYRe5Dg+/IC1lktoFotoYmpuMwBLKVpr
sBeg12t/9v0WZmcY5nWvymzKC7TJQTTJ2+TTOSJ+1sWwSo9coUEyf6mjmTzi
KMi0g7NYiqHS00xskOQOuMK2acgbAdAWr2fRDQjPZq709MWCAQXq9mVu2z+k
fwXapDUct00ljcGHBBgUxXNVK8Rsbz4wK4BC1+C42EuiofPWQWrq06s+hAPa
Fi/0NtYuiqwhzAacKJ2ChgBeZIv11t1i8k3BgohFgX2Kd/1/joFrMNkSCjJ6
VbamCkKvqTBsj/TdS8A9XMd8FesGpTiICNxzJnwByWfasAUANbjdzu40xo/D
ZoPEv1t0odxYhXGvfLTUFSw0AmqFMG4xaRuu2O/9b8cQG/QPi83Dadg8gkLF
U2mrEMmCgFU5D5HmqUjAnY1hMZi6QKlcyM5Ym2frCezYyV9CBaVymFqtYxPB
kSjYjOdyoTKNLm9Y0j+uyPkqrNAnMLgySprxy4vEeJhst16e/pgITx4OMXMn
DgfNTGpCkUQdVJWFzk5brTdb7mNmomC++bUowXCCd57kR60aKhbsX3HuklVk
MvlQ
=EGoq
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com