Hi,All. Indeed, there is a problem. Removed 1 TB of data space on a cluster is not cleared. This feature of the behavior or a bug? And how long will it be cleaned?
Sat Sep 20 2014 at 8:19:24 AM, Mikaƫl Cluseau <[email protected]>: > Hi all, > > I have weird behaviour on my firefly "test + convenience storage" cluster. > It consists of 2 nodes with a light imbalance in available space: > > # id weight type name up/down reweight > -1 14.58 root default > -2 8.19 host store-1 > 1 2.73 osd.1 up 1 > 0 2.73 osd.0 up 1 > 5 2.73 osd.5 up 1 > -3 6.39 host store-2 > 2 2.73 osd.2 up 1 > 3 2.73 osd.3 up 1 > 4 0.93 osd.4 up 1 > > I used to store ~8TB of rbd volumes, coming to a near-full state. There > was some annoying "stuck misplaced" PGs so I began to remove 4.5TB of data; > the weird thing is: the space hasn't been reclaimed on the OSDs, they > keeped stuck around 84% usage. I tried to move PGs around and it happens > that the space is correctly "reclaimed" if I take an OSD out, let him empty > it XFS volume and then take it in again. > > I'm currently applying this to and OSD in turn, but I though it could be > worth telling about this. The current ceph df output is: > > GLOBAL: > SIZE AVAIL RAW USED %RAW USED > 12103G 5311G 6792G 56.12 > POOLS: > NAME ID USED %USED OBJECTS > data 0 0 0 0 > metadata 1 0 0 0 > rbd 2 444G 3.67 117333 > [...] > archives-ec 14 3628G 29.98 928902 > archives 15 37518M 0.30 273167 > > Before "just moving data", AVAIL was around 3TB. > > I finished the process with the OSDs on store-1, who show the following > space usage now: > > /dev/sdb1 2.8T 1.4T 1.4T 50% /var/lib/ceph/osd/ceph-0 > /dev/sdc1 2.8T 1.3T 1.5T 46% /var/lib/ceph/osd/ceph-1 > /dev/sdd1 2.8T 1.3T 1.5T 48% /var/lib/ceph/osd/ceph-5 > > I'm currently fixing OSD 2, 3 will be the last one to be fixed. The df on > store-2 shows the following: > > /dev/sdb1 2.8T 1.9T 855G *70%* /var/lib/ceph/osd/ceph-2 > /dev/sdc1 2.8T 2.4T 417G *86%* /var/lib/ceph/osd/ceph-3 > /dev/sdd1 932G 481G 451G 52% /var/lib/ceph/osd/ceph-4 > > OSD 2 was at 84% 3h ago, and OSD 3 was ~75%. > > During rbd rm (that took a bit more that 3 days), ceph log was showing > things like that: > > 2014-09-03 16:17:38.831640 mon.0 192.168.1.71:6789/0 417194 : [INF] pgmap > v14953987: 3196 pgs: 2882 active+clean, 314 active+remapped; 7647 GB data, > 11067 GB used, 3828 GB / 14896 GB avail; 0 B/s rd, 6778 kB/s wr, 18 op/s; > -5/5757286 objects degraded (-0.000%) > [...] > 2014-09-05 03:09:59.895507 mon.0 192.168.1.71:6789/0 513976 : [INF] pgmap > v15050766: 3196 pgs: 2882 active+clean, 314 active+remapped; 6010 GB data, > 11156 GB used, 3740 GB / 14896 GB avail; 0 B/s rd, 0 B/s wr, 8 op/s; > -388631/5247320 objects degraded (-7.406%) > [...] > 2014-09-06 03:56:50.008109 mon.0 192.168.1.71:6789/0 580816 : [INF] pgmap > v15117604: 3196 pgs: 2882 active+clean, 314 active+remapped; 4865 GB data, > 11207 GB used, 3689 GB / 14896 GB avail; 0 B/s rd, 6117 kB/s wr, 22 op/s; > -706519/3699415 objects degraded (-19.098%) > 2014-09-06 03:56:44.476903 osd.0 192.168.1.71:6805/11793 729 : [WRN] 1 > slow requests, 1 included below; oldest blocked for > 30.058434 secs > 2014-09-06 03:56:44.476909 osd.0 192.168.1.71:6805/11793 730 : [WRN] slow > request 30.058434 seconds old, received at 2014-09-06 03:56:14.418429: > osd_op(client.19843278.0:46081 rb.0.c7fd7f.238e1f29.00000000b3fa [delete] > 15.b8fb7551 ack+ondisk+write e38950) v4 currently waiting for blocked object > 2014-09-06 03:56:49.477785 osd.0 192.168.1.71:6805/11793 731 : [WRN] 2 > slow requests, 1 included below; oldest blocked for > 35.059315 secs > [... stabilizes here:] > 2014-09-06 22:13:48.771531 mon.0 192.168.1.71:6789/0 632527 : [INF] pgmap > v15169313: 3196 pgs: 2882 active+clean, 314 active+remapped; 4139 GB data, > 11215 GB used, 3681 GB / 14896 GB avail; 64 B/s rd, 64 B/s wr, 0 op/s; > -883219/3420796 objects degraded (-25.819%) > [...] > 2014-09-07 03:09:48.491325 mon.0 192.168.1.71:6789/0 633880 : [INF] pgmap > v15170666: 3196 pgs: 2882 active+clean, 314 active+remapped; 4139 GB data, > 11215 GB used, 3681 GB / 14896 GB avail; 18727 B/s wr, 2 op/s; > -883219/3420796 objects degraded (-25.819%) > > And now, during data movement I described before: > > 2014-09-20 15:16:13.394694 mon.0 [INF] pgmap v15344707: 3196 pgs: 2132 > active+clean, 432 active+remapped+wait_backfill, 621 active+remapped, 11 > active+remapped+backfilling; 4139 GB data, 6831 GB used, 5271 GB / 12103 GB > avail; 379097/3792969 objects degraded (9.995%) > > If some ceph developer wants me to do something or to provide some data, > please say so quickly, I will probably process OSD 3 in ~16-20h. > (of course, I'd prefer not loose the data btw :-)) > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
