Re: [ceph-users] One Mon log huge and this Mon down often

Joao Eduardo Luis Sun, 24 Aug 2014 04:58:51 -0700

On 08/24/2014 01:57 AM, debian Only wrote:

this is happen i use *ceph-deploy create ceph01-vm ceph02-vm ceph04-vm
*to create 3 Mons member.
now every 10 hours, one  Mon will down.   every time have this error,
  some time the hardisk have enough space left,such as 30G.


i deployed Ceph before,  only create one Mon at first step *ceph-deploy
create ceph01-vm ,  and then ceph-deploy mon add ceph02-vm, *not meet
this problem.

i do not know why ?

Your monitor shutdown because the disk the monitor is sitting on hasdropped to (or below) 5% of available disk space. This is meant toprevent the monitor from running out of disk space and be unable tostore critical cluster information. 5% is a rough estimate, which maybe adequate for some disks, but may be either too small or too large forsmall disks and large disks respectively. This value can be adjusted ifyou feel like you need to, using the 'mon_data_avail_crit' option (whichdefaults to 5, as in 5%, but can be adjusted to whatever suits you best).

The big problem here however seems to be that you're running out ofspace due to huge monitor logs. Is that it?


If so, I would ask you to run the following commands and share the results:

ceph daemon mon.* config get debug_mon
ceph daemon mon.* config get debug_ms
ceph daemon mon.* config get debug_paxos

  -Joao


2014-08-23 10:19:43.910650 7f3c0028c700  0
mon.ceph01-vm@1(peon).data_health(56) *update_stats avail 5% total
15798272 used 12941508 avail 926268*
2014-08-23 10:19:43.910806 7f3c0028c700 -1
mon.ceph01-vm@1(peon).data_health(56) reached critical levels of
available space on local monitor storage -- shutdown!
2014-08-23 10:19:43.910811 7f3c0028c700  0 ** Shutdown via Data Health
Service **
2014-08-23 10:19:43.931427 7f3bffa8b700  1
mon.ceph01-vm@1(peon).paxos(paxos active c 15814..16493) is_readable
now=2014-08-23 10:19:43.931433 lease_expire=2014-08-23 10:19:45.989585
has v0 lc 16493
2014-08-23 10:19:43.931486 7f3bfe887700 -1 mon.ceph01-vm@1(peon) e2 ***
Got Signal Interrupt ***
2014-08-23 10:19:43.931515 7f3bfe887700  1 mon.ceph01-vm@1(peon) e2 shutdown
2014-08-23 10:19:43.931725 7f3bfe887700  0 quorum service shutdown
2014-08-23 10:19:43.931730 7f3bfe887700  0
mon.ceph01-vm@1(shutdown).health(56) HealthMonitor::service_shutdown 1
services
2014-08-23 10:19:43.931735 7f3bfe887700  0 quorum service shutdown



2014-08-22 21:31 GMT+07:00 debian Only <[email protected]
<mailto:[email protected]>>:

    this time ceph01-vm down, no big log happen ,  other 2 ok.    do not
    what's the reason,  this is not my first time install Ceph.  but
    this is first time i meet that mon down again and again.

    ceph.conf on each OSDs and MONs
      [global]
    fsid = 075f1aae-48de-412e-b024-b0f014dbc8cf
    mon_initial_members = ceph01-vm, ceph02-vm, ceph04-vm
    mon_host = 192.168.123.251,192.168.123.252,192.168.123.250
    auth_cluster_required = cephx
    auth_service_required = cephx
    auth_client_required = cephx
    filestore_xattr_use_omap = true

    rgw print continue = false
    rgw dns name = ceph-radosgw
    osd pool default pg num = 128
    osd pool default pgp num = 128


    [client.radosgw.gateway]
    host = ceph-radosgw
    keyring = /etc/ceph/ceph.client.radosgw.keyring
    rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
    log file = /var/log/ceph/client.radosgw.gateway.log


    2014-08-22 18:15 GMT+07:00 Joao Eduardo Luis <[email protected]
    <mailto:[email protected]>>:

        On 08/22/2014 10:21 AM, debian Only wrote:

            i have  3 mons in Ceph 0.80.5 on Wheezy. have one RadosGW

            when happen this first time, i increase the mon log device.
            this time mon.ceph02-vm down, only this mon down,  other 2
            is ok.

            pls some one give me some guide.

               27M Aug 22 02:11 ceph-mon.ceph04-vm.log
               43G Aug 22 02:11 ceph-mon.ceph02-vm.log
               2G Aug 22 02:11 ceph-mon.ceph01-vm.log


        Depending on the debug level you set, and depending on which
        subsystems you set a higher debug level, the monitor can spit
        out A LOT of information in a short period of time.  43GB is
        nothing compared to some 100+ GB logs I've had churn through in
        the past.

        However, I'm not grasping what kind of help you need.  According
        to your 'ceph -s' below the monitors seem okay -- all are in,
        health is OK.

        If you issue is with having that one monitor spitting out
        humongous amounts of debug info here's what you need to do:

        - If you added one or more 'debug <something> = X' to that
        monitor's ceph.conf, you will want to remove them so that in a
        future restart the monitor doesn't start with non-default debug
        levels.

        - You will want to inject default debug levels into that one
        monitor.

        Depending on what debug levels you have increased, you will want
        to run a version of "ceph tell mon.ceph02-vm injectargs
        '--debug-mon 1/5 --debug-ms 0/5 --debug-paxos 1/5'"

           -Joao


            # ceph -s
                  cluster 075f1aae-48de-412e-b024-__b0f014dbc8cf
                   health HEALTH_OK
                   monmap e2: 3 mons at
            
{ceph01-vm=192.168.123.251:__6789/0,ceph02-vm=192.168.123.__252:6789/0,ceph04-vm=192.168.__123.250:6789/0
            
<http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252:6789/0,ceph04-vm=192.168.123.250:6789/0>
            
<http://192.168.123.251:6789/__0,ceph02-vm=192.168.123.252:__6789/0,ceph04-vm=192.168.123.__250:6789/0
            
<http://192.168.123.251:6789/0,ceph02-vm=192.168.123.252:6789/0,ceph04-vm=192.168.123.250:6789/0>>},

            election epoch 44, quorum 0,1,2 ceph04-vm,ceph01-vm,ceph02-vm
                   mdsmap e10: 1/1/1 up {0=ceph06-vm=up:active}
                   osdmap e145: 10 osds: 10 up, 10 in
                    pgmap v4394: 2392 pgs, 21 pools, 4503 MB data, 1250
            objects
                          13657 MB used, 4908 GB / 4930 GB avail
                              2392 active+clean


            /2014-08-22 02:06:34.738828 7ff2b9557700  1

            mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
            is_readable
            now=2014-08-22 02:06:34.738830 lease_expire=2014-08-22
            02:06:39.701305
            has v0 lc 9756/
            /2014-08-22 02:06:36.618805 7ff2b9557700  1

            mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
            is_readable
            now=2014-08-22 02:06:36.618807 lease_expire=2014-08-22
            02:06:39.701305
            has v0 lc 9756/
            /2014-08-22 02:06:36.620019 7ff2b9557700  1

            mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
            is_readable
            now=2014-08-22 02:06:36.620021 lease_expire=2014-08-22
            02:06:39.701305
            has v0 lc 9756/
            /2014-08-22 02:06:36.620975 7ff2b9557700  1

            mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
            is_readable
            now=2014-08-22 02:06:36.620977 lease_expire=2014-08-22
            02:06:39.701305
            has v0 lc 9756/
            /2014-08-22 02:06:36.629362 7ff2b9557700  0
            mon.ceph02-vm@2(peon) e2

            handle_command mon_command({"prefix": "mon_status",
            "format": "json"} v
            0) v1/
            /2014-08-22 02:06:36.633007 7ff2b9557700  0
            mon.ceph02-vm@2(peon) e2
            handle_command mon_command({"prefix": "status", "format":
            "json"} v 0) v1/
            /2014-08-22 02:06:36.637002 7ff2b9557700  0
            mon.ceph02-vm@2(peon) e2

            handle_command mon_command({"prefix": "health", "detail":
            "", "format":
            "json"} v 0) v1/
            /2014-08-22 02:06:36.640971 7ff2b9557700  0
            mon.ceph02-vm@2(peon) e2

            handle_command mon_command({"dumpcontents": ["pgs_brief"],
            "prefix": "pg
            dump", "format": "json"} v 0) v1/
            /2014-08-22 02:06:36.641014 7ff2b9557700  1

            mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9756)
            is_readable
            now=2014-08-22 02:06:36.641016 lease_expire=2014-08-22
            02:06:39.701305
            has v0 lc 9756/
            /2014-08-22 02:06:37.520387 7ff2b9557700  1

            mon.ceph02-vm@2(peon).paxos(__paxos active c 9037..9757)
            is_readable
            now=2014-08-22 02:06:37.520388 lease_expire=2014-08-22
            02:06:42.501572
            has v0 lc 9757/



            _________________________________________________
            ceph-users mailing list
            [email protected] <mailto:[email protected]>
            http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
            <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>



        --
        Joao Eduardo Luis
        Software Engineer | http://inktank.com | http://ceph.com



--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] One Mon log huge and this Mon down often

Reply via email to