Re: [ceph-users] Another osd is filled too full and taken off after manually taking one osd out

Leen Besselink Tue, 18 Jun 2013 05:30:13 -0700

On Tue, Jun 18, 2013 at 08:13:39PM +0800, Da Chun wrote:
> Hi List,My ceph cluster has two osds on each node. One has 15g capacity, and 
> the other 10g.
> It's interesting that, after I took the 15g osd out of the cluster, the 
> cluster started to rebalance, and finally the 10g osd on the same node was 
> finally full and taken off, and failed to start again with the following 
> error in the osd log file:
> 2013-06-18 19:51:20.799756 7f6805ee07c0 -1 
> filestore(/var/lib/ceph/osd/ceph-1) Extended attributes don't appear to work. 
> Got error (28) No space left on device. If you are using ext3 or ext4, be 
> sure to mount the underlying file system with the 'user_xattr' option.
> 2013-06-18 19:51:20.800258 7f6805ee07c0 -1 ^[[0;31m ** ERROR: error 
> converting store /var/lib/ceph/osd/ceph-1: (95) Operation not supported^[[0m
> 
> 
> 
> I guess the 10g osd was chosen by the cluster to be the container for the 
> extra objects.
> My questions here:
> 1. How are the extra objects spread in the cluster after an osd is taken out? 
> Only spread to one of the osds?
> 2. Is there no mechanism to prevent the osds from being filled too full and 
> taken off?
>


As far I understand it.

Each OSD has the same weight by default, you can give them a different weight 
to force it to be used less.

The reason to do so could be because it has less space or because it is slower.

> 
> Thanks for your time!
> 
> 
> This is the ceph log:
> 2013-06-18 19:26:41.567607 mon.0 172.18.46.34:6789/0 1599 : [INF] pgmap 
> v14182: 456 pgs: 453 active+clean, 3 active+remapped+backfilling; 16874 MB 
> data, 40220 MB used, 36513 MB / 76733 MB avail; 379/9761 degraded (3.883%);  
> recovering 19 o/s, 77608KB/s
> 2013-06-18 19:26:42.649139 mon.0 172.18.46.34:6789/0 1600 : [INF] pgmap 
> v14183: 456 pgs: 454 active+clean, 2 active+remapped+backfilling; 16874 MB 
> data, 40222 MB used, 36511 MB / 76733 MB avail; 309/9745 degraded (3.171%);  
> recovering 41 o/s, 162MB/s
> 2013-06-18 19:26:46.566721 mon.0 172.18.46.34:6789/0 1601 : [INF] pgmap 
> v14184: 456 pgs: 454 active+clean, 2 active+remapped+backfilling; 16874 MB 
> data, 40222 MB used, 36511 MB / 76733 MB avail; 250/9745 degraded (2.565%);  
> recovering 25 o/s, 101450KB/s
> 2013-06-18 19:26:39.858833 osd.1 172.18.46.35:6801/10730 88 : [WRN] OSD near 
> full (91%)
> 2013-06-18 19:26:48.548076 mon.0 172.18.46.34:6789/0 1602 : [INF] pgmap 
> v14185: 456 pgs: 454 active+clean, 2 active+remapped+backfilling; 16874 MB 
> data, 40222 MB used, 36511 MB / 76733 MB avail; 200/9745 degraded (2.052%);  
> recovering 18 o/s, 72359KB/s
> 2013-06-18 19:26:51.898811 mon.0 172.18.46.34:6789/0 1603 : [INF] pgmap 
> v14186: 456 pgs: 454 active+clean, 2 active+remapped+backfilling; 16874 MB 
> data, 40222 MB used, 36511 MB / 76733 MB avail; 155/9745 degraded (1.591%);  
> recovering 17 o/s, 71823KB/s
> 2013-06-18 19:26:53.947739 mon.0 172.18.46.34:6789/0 1604 : [INF] pgmap 
> v14187: 456 pgs: 454 active+clean, 2 active+remapped+backfilling; 16874 MB 
> data, 40222 MB used, 36511 MB / 76733 MB avail; 113/9745 degraded (1.160%);  
> recovering 16 o/s, 65041KB/s
> 2013-06-18 19:26:57.293713 mon.0 172.18.46.34:6789/0 1605 : [INF] pgmap 
> v14188: 456 pgs: 454 active+clean, 2 active+remapped+backfilling; 16874 MB 
> data, 40222 MB used, 36511 MB / 76733 MB avail; 103/9745 degraded (1.057%);  
> recovering 9 o/s, 37353KB/s
> 2013-06-18 19:27:03.861124 mon.0 172.18.46.34:6789/0 1606 : [INF] pgmap 
> v14189: 456 pgs: 454 active+clean, 2 active+remapped+backfilling; 16874 MB 
> data, 35598 MB used, 41134 MB / 76733 MB avail; 103/9745 degraded (1.057%);  
> recovering 1 o/s, 3532KB/s
> 2013-06-18 19:27:13.732263 mon.0 172.18.46.34:6789/0 1607 : [DBG] osd.1 
> 172.18.46.35:6801/10730 reported failed by osd.0 172.18.46.34:6804/1506
> 2013-06-18 19:27:15.949395 mon.0 172.18.46.34:6789/0 1608 : [DBG] osd.1 
> 172.18.46.35:6801/10730 reported failed by osd.3 172.18.46.34:6807/11743
> 2013-06-18 19:27:17.239206 mon.0 172.18.46.34:6789/0 1609 : [DBG] osd.1 
> 172.18.46.35:6801/10730 reported failed by osd.5 172.18.46.36:6806/7436
> 2013-06-18 19:27:17.239404 mon.0 172.18.46.34:6789/0 1610 : [INF] osd.1 
> 172.18.46.35:6801/10730 failed (3 reports from 3 peers after 2013-06-18 
> 19:27:38.239157 >= grace 20.000000)
> 2013-06-18 19:27:17.306958 mon.0 172.18.46.34:6789/0 1611 : [INF] osdmap 
> e647: 6 osds: 5 up, 5 in
> 2013-06-18 19:27:17.387311 mon.0 172.18.46.34:6789/0 1612 : [INF] pgmap 
> v14190: 456 pgs: 335 active+clean, 119 stale+active+clean, 2 
> active+remapped+backfilling; 16874 MB data, 35598 MB used, 41134 MB / 76733 
> MB avail; 103/9745 degraded (1.057%)
> 2013-06-18 19:27:18.308209 mon.0 172.18.46.34:6789/0 1613 : [INF] osdmap 
> e648: 6 osds: 5 up, 5 in
> 2013-06-18 19:27:18.316487 mon.0 172.18.46.34:6789/0 1614 : [INF] pgmap 
> v14191: 456 pgs: 335 active+clean, 119 stale+active+clean, 2 
> active+remapped+backfilling; 16874 MB data, 35598 MB used, 41134 MB / 76733 
> MB avail; 103/9745 degraded (1.057%)
> 2013-06-18 19:27:22.676915 mon.0 172.18.46.34:6789/0 1615 : [INF] pgmap 
> v14192: 456 pgs: 280 active+clean, 79 stale+active+clean, 1 active+remapped, 
> 1 active+remapped+backfilling, 95 active+degraded; 16874 MB data, 35596 MB 
> used, 41137 MB / 76733 MB avail; 318/9334 degraded (3.407%);  recovering 0 
> o/s, 762KB/s
> 2013-06-18 19:27:23.766125 mon.0 172.18.46.34:6789/0 1616 : [INF] pgmap 
> v14193: 456 pgs: 162 active+clean, 2 active+remapped, 292 active+degraded; 
> 16874 MB data, 35612 MB used, 41121 MB / 76733 MB avail; 15EB/s rd, 0op/s; 
> 2031/8972 degraded (22.637%);  recovering 15E o/s, 15EB/s
> 2013-06-18 19:29:03.896056 mon.0 172.18.46.34:6789/0 1617 : [INF] pgmap 
> v14194: 456 pgs: 162 active+clean, 2 active+remapped, 292 active+degraded; 
> 16874 MB data, 35612 MB used, 41121 MB / 76733 MB avail; 15EB/s rd, 0op/s; 
> 2031/8972 degraded (22.637%);  recovering 15E o/s, 15EB/s
> 2013-06-18 19:29:22.700301 mon.0 172.18.46.34:6789/0 1618 : [INF] pgmap 
> v14195: 456 pgs: 162 active+clean, 2 active+remapped, 292 active+degraded; 
> 16874 MB data, 35615 MB used, 41118 MB / 76733 MB avail; 2031/8972 degraded 
> (22.637%)
> 2013-06-18 19:29:23.759014 mon.0 172.18.46.34:6789/0 1619 : [INF] pgmap 
> v14196: 456 pgs: 162 active+clean, 2 active+remapped, 292 active+degraded; 
> 16874 MB data, 35596 MB used, 41137 MB / 76733 MB avail; 2031/8972 degraded 
> (22.637%)
> 2013-06-18 19:31:03.932470 mon.0 172.18.46.34:6789/0 1620 : [INF] pgmap 
> v14197: 456 pgs: 162 active+clean, 2 active+remapped, 292 active+degraded; 
> 16874 MB data, 35596 MB used, 41137 MB / 76733 MB avail; 2031/8972 degraded 
> (22.637%)
> 2013-06-18 19:32:18.012211 mon.0 172.18.46.34:6789/0 1621 : [INF] osd.1 out 
> (down for 300.715725)

> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Another osd is filled too full and taken off after manually taking one osd out

Reply via email to