[ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

Mandar Naik Wed, 09 Aug 2017 23:46:57 -0700

*Hi,I am evaluating ceph cluster for a solution where ceph could be used
for provisioningpools which could be either stored local to a node or
replicated across a cluster.  This way ceph could be used as single point
of solution for writing both local as well as replicateddata. Local storage
helps avoid possible storage cost that comes with replication factor of
more than one and also provide availability as long as the data host is
alive.  So I tried an experiment with Ceph cluster where there is one crush
rule which replicates data acrossnodes and other one only points to a crush
bucket that has local ceph osd. Cluster configuration is pasted below.Here
I observed that if one of the disk is full (95%) entire cluster goes into
error state and stopsaccepting new writes from/to other nodes. So ceph
cluster became unusable even though it’s only32% full. The writes are
blocked even for pools which are not touching the full osd.I have tried
playing around crush hierarchy but it did not help. So is it possible to
store data in the abovemanner with Ceph ? If yes could we get cluster state
in usable state after one of the node is full ?# ceph dfGLOBAL:    SIZE
    AVAIL      RAW USED     %RAW USED    134G     94247M       43922M
        31.79# ceph –s    cluster ba658a02-757d-4e3c-7fb3-dc4bf944322f
    health HEALTH_ERR            1 full osd(s)
           full,sortbitwise,require_jewel_osds flag(s) set     monmap e3: 3
mons at
{ip-10-0-9-122=10.0.9.122:6789/0,ip-10-0-9-146=10.0.9.146:6789/0,ip-10-0-9-210=10.0.9.210:6789/0
<http://10.0.9.122:6789/0,ip-10-0-9-146=10.0.9.146:6789/0,ip-10-0-9-210=10.0.9.210:6789/0>}
           election epoch 14, quorum 0,1,2
ip-10-0-9-122,ip-10-0-9-146,ip-10-0-9-210     osdmap e93: 3 osds: 3 up, 3
in            flags full,sortbitwise,require_jewel_osds      pgmap v630:
384 pgs, 6 pools, 43772 MB data, 18640 objects            43922 MB used,
94247 MB / 134 GB avail                 384 active+clean# ceph osd treeID
WEIGHT  TYPE NAME               UP/DOWN REWEIGHT PRIMARY-AFFINITY-9 0.04399
rack ip-10-0-9-146-rack-8 0.04399     host ip-10-0-9-146 2 0.04399
        osd.2                up  1.00000          1.00000-7 0.04399 rack
ip-10-0-9-210-rack-6 0.04399     host ip-10-0-9-210 1 0.04399         osd.1
               up  1.00000          1.00000-5 0.04399 rack
ip-10-0-9-122-rack-3 0.04399     host ip-10-0-9-122 0 0.04399         osd.0
               up  1.00000          1.00000-4 0.13197 rack rep-rack-3
0.04399     host ip-10-0-9-122 0 0.04399         osd.0                up
 1.00000          1.00000-6 0.04399     host ip-10-0-9-210 1 0.04399
        osd.1                up  1.00000          1.00000-8 0.04399
    host ip-10-0-9-146 2 0.04399         osd.2                up  1.00000
         1.00000# ceph osd crush rule list[    "rep_ruleset",
   "ip-10-0-9-122_ruleset",    "ip-10-0-9-210_ruleset",
   "ip-10-0-9-146_ruleset"]# ceph osd crush rule dump rep_ruleset{
   "rule_id": 0,    "rule_name": "rep_ruleset",    "ruleset": 0,    "type":
1,    "min_size": 1,    "max_size": 10,    "steps": [        {
           "op": "take",            "item": -4,            "item_name":
"rep-rack"        },        {            "op": "chooseleaf_firstn",
           "num": 0,            "type": "host"        },        {
           "op": "emit"        }    ]}# ceph osd crush rule dump
ip-10-0-9-122_ruleset{    "rule_id": 1,    "rule_name":
"ip-10-0-9-122_ruleset",    "ruleset": 1,    "type": 1,    "min_size": 1,
   "max_size": 10,    "steps": [        {            "op": "take",
           "item": -5,            "item_name": "ip-10-0-9-122-rack"
       },        {            "op": "chooseleaf_firstn",            "num":
0,            "type": "host"        },        {            "op": "emit"
       }    ]}*


-- 
Thanks,
Mandar Naik.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph cluster in error state (full) with raw usage 32% of total capacity

Reply via email to