On 08/24/2014 01:57 AM, debian Only wrote:
this is happen i use *ceph-deploy create ceph01-vm ceph02-vm ceph04-vm
*to create 3 Mons member.
now every 10 hours, one Mon will down. every time have this error,
some time the hardisk have enough space left,such as 30G.
i deployed Ceph before, only create one Mon at first step *ceph-deploy
create ceph01-vm , and then ceph-deploy mon add ceph02-vm, *not meet
this problem.
i do not know why ?
Your monitor shutdown because the disk the monitor is sitting on has
dropped to (or below) 5% of available disk space. This is meant to
prevent the monitor from running out of disk space and be unable to
store critical cluster information. 5% is a rough estimate, which may
be adequate for some disks, but may be either too small or too large for
small disks and large disks respectively. This value can be adjusted if
you feel like you need to, using the 'mon_data_avail_crit' option (which
defaults to 5, as in 5%, but can be adjusted to whatever suits you best).
The big problem here however seems to be that you're running out of
space due to huge monitor logs. Is that it?
If so, I would ask you to run the following commands and share the results:
ceph daemon mon.* config get debug_mon
ceph daemon mon.* config get debug_ms
ceph daemon mon.* config get debug_paxos
-Joao
2014-08-23 10:19:43.910650 7f3c0028c700 0
mon.ceph01-vm@1(peon).data_health(56) *update_stats avail 5% total
15798272 used 12941508 avail 926268*
2014-08-23 10:19:43.910806 7f3c0028c700 -1
mon.ceph01-vm@1(peon).data_health(56) reached critical levels of
available space on local monitor storage -- shutdown!
2014-08-23 10:19:43.910811 7f3c0028c700 0 ** Shutdown via Data Health
Service **
2014-08-23 10:19:43.931427 7f3bffa8b700 1
mon.ceph01-vm@1(peon).paxos(paxos active c 15814..16493) is_readable
now=2014-08-23 10:19:43.931433 lease_expire=2014-08-23 10:19:45.989585
has v0 lc 16493
2014-08-23 10:19:43.931486 7f3bfe887700 -1 mon.ceph01-vm@1(peon) e2 ***
Got Signal Interrupt ***
2014-08-23 10:19:43.931515 7f3bfe887700 1 mon.ceph01-vm@1(peon) e2 shutdown
2014-08-23 10:19:43.931725 7f3bfe887700 0 quorum service shutdown
2014-08-23 10:19:43.931730 7f3bfe887700 0
mon.ceph01-vm@1(shutdown).health(56) HealthMonitor::service_shutdown 1
services
2014-08-23 10:19:43.931735 7f3bfe887700 0 quorum service shutdown
2014-08-22 21:31 GMT+07:00 debian Only <[email protected]
<mailto:[email protected]>>:
this time ceph01-vm down, no big log happen , other 2 ok. do not
what's the reason, this is not my first time install Ceph. but
this is first time i meet that mon down again and again.
ceph.conf on each OSDs and MONs
[global]
fsid = 075f1aae-48de-412e-b024-b0f014dbc8cf
mon_initial_members = ceph01-vm, ceph02-vm, ceph04-vm
mon_host = 192.168.123.251,192.168.123.252,192.168.123.250
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
rgw print continue = false
rgw dns name = ceph-radosgw
osd pool default pg num = 128
osd pool default pgp num = 128
[client.radosgw.gateway]
host = ceph-radosgw
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
2014-08-22 18:15 GMT+07:00 Joao Eduardo Luis <[email protected]
<mailto:[email protected]>>:
On 08/22/2014 10:21 AM, debian Only wrote:
i have 3 mons in Ceph 0.80.5 on Wheezy. have one RadosGW
when happen this first time, i increase the mon log device.
this time mon.ceph02-vm down, only this mon down, other 2
is ok.
pls some one give me some guide.
27M Aug 22 02:11 ceph-mon.ceph04-vm.log
43G Aug 22 02:11 ceph-mon.ceph02-vm.log
2G Aug 22 02:11 ceph-mon.ceph01-vm.log
Depending on the debug level you set, and depending on which
subsystems you set a higher debug level, the monitor can spit
out A LOT of information in a short period of time. 43GB is
nothing compared to some 100+ GB logs I've had churn through in
the past.
However, I'm not grasping what kind of help you need. According
to your 'ceph -s' below the monitors seem okay -- all are in,
health is OK.
If you issue is with having that one monitor spitting out
humongous amounts of debug info here's what you need to do:
- If you added one or more 'debug <something> = X' to that
monitor's ceph.conf, you will want to remove them so that in a
future restart the monitor doesn't start with non-default debug
levels.
- You will want to inject default debug levels into that one
monitor.
Depending on what debug levels you have increased, you will want
to run a version of "ceph tell mon.ceph02-vm injectargs
'--debug-mon 1/5 --debug-ms 0/5 --debug-paxos 1/5'"
-Joao
# ceph -s
cluster 075f1aae-48de-412e-b024-__b0f014dbc8cf
health HEALTH_OK
monmap e2: 3 mons at
{ceph01-vm=192.168.123.251:__6789/0,ceph02-vm=192.168.123.__252:6789/0,ceph04-vm=192.168.__123.250:6789/0
<http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252:6789/0,ceph04-vm=192.168.123.250:6789/0>
<http://192.168.123.251:6789/__0,ceph02-vm=192.168.123.252:__6789/0,ceph04-vm=192.168.123.__250:6789/0
<http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252:6789/0,ceph04-vm=192.168.123.250:6789/0>>},
election epoch 44, quorum 0,1,2 ceph04-vm,ceph01-vm,ceph02-vm
mdsmap e10: 1/1/1 up {0=ceph06-vm=up:active}
osdmap e145: 10 osds: 10 up, 10 in
pgmap v4394: 2392 pgs, 21 pools, 4503 MB data, 1250
objects
13657 MB used, 4908 GB / 4930 GB avail
2392 active+clean
/2014-08-22 02:06:34.738828 7ff2b9557700 1
mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
is_readable
now=2014-08-22 02:06:34.738830 lease_expire=2014-08-22
02:06:39.701305
has v0 lc 9756/
/2014-08-22 02:06:36.618805 7ff2b9557700 1
mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
is_readable
now=2014-08-22 02:06:36.618807 lease_expire=2014-08-22
02:06:39.701305
has v0 lc 9756/
/2014-08-22 02:06:36.620019 7ff2b9557700 1
mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
is_readable
now=2014-08-22 02:06:36.620021 lease_expire=2014-08-22
02:06:39.701305
has v0 lc 9756/
/2014-08-22 02:06:36.620975 7ff2b9557700 1
mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
is_readable
now=2014-08-22 02:06:36.620977 lease_expire=2014-08-22
02:06:39.701305
has v0 lc 9756/
/2014-08-22 02:06:36.629362 7ff2b9557700 0
mon.ceph02-vm@2(peon) e2
handle_command mon_command({"prefix": "mon_status",
"format": "json"} v
0) v1/
/2014-08-22 02:06:36.633007 7ff2b9557700 0
mon.ceph02-vm@2(peon) e2
handle_command mon_command({"prefix": "status", "format":
"json"} v 0) v1/
/2014-08-22 02:06:36.637002 7ff2b9557700 0
mon.ceph02-vm@2(peon) e2
handle_command mon_command({"prefix": "health", "detail":
"", "format":
"json"} v 0) v1/
/2014-08-22 02:06:36.640971 7ff2b9557700 0
mon.ceph02-vm@2(peon) e2
handle_command mon_command({"dumpcontents": ["pgs_brief"],
"prefix": "pg
dump", "format": "json"} v 0) v1/
/2014-08-22 02:06:36.641014 7ff2b9557700 1
mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
is_readable
now=2014-08-22 02:06:36.641016 lease_expire=2014-08-22
02:06:39.701305
has v0 lc 9756/
/2014-08-22 02:06:37.520387 7ff2b9557700 1
mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9757)
is_readable
now=2014-08-22 02:06:37.520388 lease_expire=2014-08-22
02:06:42.501572
has v0 lc 9757/
_________________________________________________
ceph-users mailing list
[email protected] <mailto:[email protected]>
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com