Hi all,
i have a big problem and i really hope someone can help me!
We are running a ceph cluster since a year now. Version is: 0.94.7 (Hammer)
Here is some info:
Our osd map is:
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 26.67998 root default
-2 3.64000 host ceph1
0 3.64000 osd.0 up 1.00000 1.00000
-3 3.50000 host ceph2
1 3.50000 osd.1 up 1.00000 1.00000
-4 3.64000 host ceph3
2 3.64000 osd.2 up 1.00000 1.00000
-5 15.89998 host ceph4
3 4.00000 osd.3 up 1.00000 1.00000
4 3.59999 osd.4 up 1.00000 1.00000
5 3.29999 osd.5 up 1.00000 1.00000
6 5.00000 osd.6 up 1.00000 1.00000
ceph df:
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
40972G 26821G 14151G 34.54
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
blocks 7 4490G 10.96 1237G 7037004
commits 8 473M 0 1237G 802353
fs 9 9666M 0.02 1237G 7863422
ceph osd df:
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR
0 3.64000 1.00000 3724G 3128G 595G 84.01 2.43
1 3.50000 1.00000 3724G 3237G 487G 86.92 2.52
2 3.64000 1.00000 3724G 3180G 543G 85.41 2.47
3 4.00000 1.00000 7450G 1616G 5833G 21.70 0.63
4 3.59999 1.00000 7450G 1246G 6203G 16.74 0.48
5 3.29999 1.00000 7450G 1181G 6268G 15.86 0.46
6 5.00000 1.00000 7450G 560G 6889G 7.52 0.22
TOTAL 40972G 14151G 26820G 34.54
MIN/MAX VAR: 0.22/2.52 STDDEV: 36.53
Our current cluster state is:
health HEALTH_WARN
63 pgs backfill
8 pgs backfill_toofull
9 pgs backfilling
11 pgs degraded
1 pgs recovering
10 pgs recovery_wait
11 pgs stuck degraded
89 pgs stuck unclean
recovery 8237/52179437 objects degraded (0.016%)
recovery 9620295/52179437 objects misplaced (18.437%)
2 near full osd(s)
noout,noscrub,nodeep-scrub flag(s) set
monmap e8: 4 mons at
{ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0}
election epoch 400, quorum 0,1,2,3 ceph1,ceph2,ceph3,ceph4
osdmap e1774: 7 osds: 7 up, 7 in; 84 remapped pgs
flags noout,noscrub,nodeep-scrub
pgmap v7316159: 320 pgs, 3 pools, 4501 GB data, 15336 kobjects
14152 GB used, 26820 GB / 40972 GB avail
8237/52179437 objects degraded (0.016%)
9620295/52179437 objects misplaced (18.437%)
231 active+clean
61 active+remapped+wait_backfill
9 active+remapped+backfilling
6 active+recovery_wait+degraded+remapped
6 active+remapped+backfill_toofull
4 active+recovery_wait+degraded
2 active+remapped+wait_backfill+backfill_toofull
1 active+recovering+degraded
recovery io 11754 kB/s, 35 objects/s
client io 1748 kB/s rd, 249 kB/s wr, 44 op/s
My main problems are:
- As you can see from the osd tree, we have three separate hosts with only one
osd each. Another one has four osds. Ceph allows me not to get data back from
these three nodes with only one HDD, which are all near full. I tried to set
the weight of the osds in the bigger node higher but this just does not work.
So i added a new osd yesterday which made things not better, as you can see
now. What do i have to do to just become these three nodes empty again and put
more data on the other node with the four HDDs.
- I added the „ceph4“ node later, this resulted in a strange ip change as you
can see in the mon list. The public network and the cluster network were
swapped or not assigned right. See ceph.conf
[global]
fsid = xxx
mon_initial_members = ceph1
mon_host = 192.168.10.3, 192.168.10.4, 192.168.10.5, 192.168.10.11
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public_network = 192.168.60.0/24
cluster_network = 192.168.10.0/24
osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 128
osd pool default pgp num = 128
osd recovery max active = 50
osd recovery threads = 3
mon_pg_warn_max_per_osd = 0
What can i do in this case (it’s no big problem since the network is 2x 10
GBE and everything works)?
- One other thing. Even if i just prepare the osd, it’s automatically added to
the cluster. I can not activate it. Has had someone other already such behavior?
I’m now trying to delete something in the cluster, which already helped a bit:
health HEALTH_WARN
63 pgs backfill
8 pgs backfill_toofull
10 pgs backfilling
7 pgs degraded
3 pgs recovery_wait
7 pgs stuck degraded
82 pgs stuck unclean
recovery 6498/52085528 objects degraded (0.012%)
recovery 9507140/52085528 objects misplaced (18.253%)
2 near full osd(s)
noout,noscrub,nodeep-scrub flag(s) set
monmap e8: 4 mons at
{ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0}
election epoch 400, quorum 0,1,2,3 ceph1,ceph2,ceph3,ceph4
osdmap e1780: 7 osds: 7 up, 7 in; 81 remapped pgs
flags noout,noscrub,nodeep-scrub
pgmap v7317114: 320 pgs, 3 pools, 4499 GB data, 15333 kobjects
14100 GB used, 26872 GB / 40972 GB avail
6498/52085528 objects degraded (0.012%)
9507140/52085528 objects misplaced (18.253%)
238 active+clean
60 active+remapped+wait_backfill
7 active+remapped+backfilling
6 active+remapped+backfill_toofull
3 active+degraded+remapped+backfilling
2 active+remapped+wait_backfill+backfill_toofull
2 active+recovery_wait+degraded+remapped
1 active+degraded+remapped+wait_backfill
1 active+recovery_wait+degraded
recovery io 7844 kB/s, 27 objects/s
client io 343 kB/s rd, 1 op/s
If you need more information, just say it. I need really help!
Thank you so far for reading!
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com