Hi Nathan
We build a ceph cluster with 3 nodes.
node-3: osd-2, mon-b,
node-4: osd-0, mon-a, mds-myfs-a, mgr
node-5: osd-1, mon-c, mds-myfs-b
ceph cluster created by rook.
Test phenomenon
After one node unusual down(like direct poweroff), try to mount cephfs volume
will spend more than 40 seconds.
Normally Ceph Cluster Status:
$ ceph status
cluster:
id: 776b5432-be9c-455f-bb2e-05cbf20d6f6a
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 20h)
mgr: a(active, since 21h)
mds: myfs:1 {0=myfs-a=up:active} 1 up:standby
osd: 3 osds: 3 up (since 20h), 3 in (since 21h)
data:
pools: 2 pools, 136 pgs
objects: 2.59k objects, 330 MiB
usage: 25 GiB used, 125 GiB / 150 GiB avail
pgs: 136 active+clean
io:
client: 1.5 KiB/s wr, 0 op/s rd, 0 op/s wr
Normally CephFS Status:
$ ceph fs status
myfs - 3 clients
====
+------+--------+--------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------+---------------+-------+-------+
| 0 | active | myfs-a | Reqs: 0 /s | 2250 | 2059 |
+------+--------+--------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| myfs-metadata | metadata | 208M | 39.1G |
| myfs-data0 | data | 121M | 39.1G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| myfs-b |
+-------------+
MDS version: ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
nautilus (stable)
Are you using replica or EC?
=> Not used EC
'min_size' is not smaller than 'size'?
$ ceph osd dump | grep pool
pool 1 'myfs-metadata' replicated size 3 min_size 2 crush_rule 1 object_hash
rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 16 flags hashpspool
stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5
application cephfs
pool 2 'myfs-data0' replicated size 3 min_size 2 crush_rule 2 object_hash
rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 141 lfor 0/0/53
flags hashpspool stripe_width 0 application cephfs
What is your crush map?
$ ceph osd crush dump
{
"devices": [
{
"id": 0,
"name": "osd.0",
"class": "hdd"
},
{
"id": 1,
"name": "osd.1",
"class": "hdd"
},
{
"id": 2,
"name": "osd.2",
"class": "hdd"
}
],
"types": [
{
"type_id": 0,
"name": "osd"
},
{
"type_id": 1,
"name": "host"
},
{
"type_id": 2,
"name": "chassis"
},
{
"type_id": 3,
"name": "rack"
},
{
"type_id": 4,
"name": "row"
},
{
"type_id": 5,
"name": "pdu"
},
{
"type_id": 6,
"name": "pod"
},
{
"type_id": 7,
"name": "room"
},
{
"type_id": 8,
"name": "datacenter"
},
{
"type_id": 9,
"name": "zone"
},
{
"type_id": 10,
"name": "region"
},
{
"type_id": 11,
"name": "root"
}
],
"buckets": [
{
"id": -1,
"name": "default",
"type_id": 11,
"type_name": "root",
"weight": 9594,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": -3,
"weight": 3198,
"pos": 0
},
{
"id": -5,
"weight": 3198,
"pos": 1
},
{
"id": -7,
"weight": 3198,
"pos": 2
}
]
},
{
"id": -2,
"name": "default~hdd",
"type_id": 11,
"type_name": "root",
"weight": 9594,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": -4,
"weight": 3198,
"pos": 0
},
{
"id": -6,
"weight": 3198,
"pos": 1
},
{
"id": -8,
"weight": 3198,
"pos": 2
}
]
},
{
"id": -3,
"name": "node-4",
"type_id": 1,
"type_name": "host",
"weight": 3198,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 0,
"weight": 3198,
"pos": 0
}
]
},
{
"id": -4,
"name": "node-4~hdd",
"type_id": 1,
"type_name": "host",
"weight": 3198,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 0,
"weight": 3198,
"pos": 0
}
]
},
{
"id": -5,
"name": "node-5",
"type_id": 1,
"type_name": "host",
"weight": 3198,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 1,
"weight": 3198,
"pos": 0
}
]
},
{
"id": -6,
"name": "node-5~hdd",
"type_id": 1,
"type_name": "host",
"weight": 3198,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 1,
"weight": 3198,
"pos": 0
}
]
},
{
"id": -7,
"name": "node-3",
"type_id": 1,
"type_name": "host",
"weight": 3198,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 2,
"weight": 3198,
"pos": 0
}
]
},
{
"id": -8,
"name": "node-3~hdd",
"type_id": 1,
"type_name": "host",
"weight": 3198,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 2,
"weight": 3198,
"pos": 0
}
]
}
],
"rules": [
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "myfs-metadata",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "myfs-data0",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
],
"tunables": {
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 1,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "jewel",
"optimal_tunables": 1,
"legacy_tunables": 0,
"minimum_required_version": "jewel",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 1,
"has_v5_rules": 0
},
"choose_args": {}
}
Question
How can i mount CephFS volumn as soon as possible, after one node unusual
down.Any ceph cluster(filesystem) configuration suggestion? Using EC?
Best Regards
[email protected]
From: jesper
Date: 2019-11-29 13:28
To: Peng Bo
CC: Ceph Users; hfx; Nathan Fish
Subject: Re[2]: [ceph-users] HA and data recovery of CEPH
Hi Nathan
Is that true?
The time it takes to reallocate the primary pg delivers “downtime” by design.
right? Seen from a writing clients perspective
Jesper
Sent from myMail for iOS
Friday, 29 November 2019, 06.24 +0100 from [email protected]
<[email protected]>:
Hi Nathan,
Thanks for the help.
My colleague will provide more details.
BR
On Fri, Nov 29, 2019 at 12:57 PM Nathan Fish <[email protected]> wrote:
If correctly configured, your cluster should have zero downtime from a
single OSD or node failure. What is your crush map? Are you using
replica or EC? If your 'min_size' is not smaller than 'size', then you
will lose availability.
On Thu, Nov 28, 2019 at 10:50 PM Peng Bo <[email protected]> wrote:
>
> Hi all,
>
> We are working on use CEPH to build our HA system, the purpose is the system
> should always provide service even a node of CEPH is down or OSD is lost.
>
> Currently, as we practiced once a node/OSD is down, the CEPH cluster needs to
> take about 40 seconds to sync data, our system can't provide service during
> that.
>
> My questions:
>
> Does there have any way that we can reduce the data sync time?
> How can we let the CEPH keeps available once a node/OSD is down?
>
>
> BR
>
> --
> The modern Unified Communications provider
>
> https://www.portsip.com
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
The modern Unified Communications provider
https://www.portsip.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com