Healthy:

e4: 4 mons at {storage1=
10.0.10.11:6789/0,storage2=10.0.10.12:6789/0,storage3=10.0.10.13:6789/0,storage4=10.0.10.14:6789/0},
election epoch 54, quorum 0,1,2,3 storage1,storage2,storage3,storage4

After storage1 goes down I get this over and over again:

014-01-02 09:16:23.789271 7fbbc82f3700  0 -- :/1000673 >>
10.0.10.11:6789/0pipe(0x7fbbbc003b10 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7fbbbc003d70).fault

I'm issuing these commands from an admin node that isn't running any
monitors or OSDs but the output is the same when I run them on any of the
monitors.


On Thu, Jan 2, 2014 at 2:46 AM, Wolfgang Hennerbichler <[email protected]>wrote:

> Matt,
> what does 'ceph mon stat' say when your cluster is healthy and what does
> it say when it's unhealty?
>
> Again my example:
>
> # ceph mon stat
> e3: 3 mons at
> {node01=
> 10.32.0.181:6789/0,node02=10.32.0.182:6789/0,node03=10.32.0.183:6789/0},
> election epoch 14, quorum 0,1,2 node01,node02,node03
>
> Wolfgang
>
> On 01/01/2014 10:29 PM, Matt Rabbitt wrote:
> > I only have four because I want to remove the original one I used to
> > create the cluster.  I tried what you suggested and rebooted all my
> > nodes but I'm still having the same problem.  I'm running Emperor on
> > Ubuntu 12.04 on all my nodes by the way.  Here is what I'm seeing as I
> > run ceph -w and reboot my original monitor.
> >
> >      osdmap e124: 12 osds: 12 up, 12 in
> >       pgmap v26271: 528 pgs, 3 pools, 6979 MB data, 1883 objects
> >             20485 MB used, 44670 GB / 44690 GB avail
> >                  528 active+clean
> >
> >
> > 2014-01-01 16:21:30.807305 mon.0 [INF] pgmap v26271: 528 pgs: 528
> > active+clean; 6979 MB data, 20485 MB us
> >                     ed, 44670 GB / 44690 GB avail
> > 2014-01-01 16:22:06.098971 7f272d539700  0 monclient: hunting for new mon
> > 2014-01-01 16:23:04.823206 7fe84c1bb700  0 -- :/1019476 >>
> > 10.0.10.11:6789/0 <http://10.0.10.11:6789/0> pipe(0x7fe840009090 sd=3
> >                                                    :0 s=1 pgs=0 cs=0 l=1
> > c=0x7fe8400092f0).fault
> > 2014-01-01 16:23:07.821642 7fe8443f9700  0 -- :/1019476 >>
> > 10.0.10.11:6789/0 <http://10.0.10.11:6789/0> pipe(0x7fe840004140 sd=3
> >                                                    :0 s=1 pgs=0 cs=0 l=1
> > c=0x7fe8400043a0).fault
> >
> > ^this fault error continues until the monitor comes back online.
> >
> >
> >
> > On Wed, Jan 1, 2014 at 4:04 PM, Wolfgang Hennerbichler <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Matt,
> >
> >     first of all: four monitors is a bad idea. use an odd number for
> >     mons, e. g. three. your other problem is your configuration file.
> >     the mon_initial members and mon_host directives should include all
> >     monitor daemons. see my cluster:
> >
> >     mon_initial_members = node01,node02,node03
> >     mon_host = 10.32.0.181,10.32.0.182,10.32.0.183
> >
> >     hth
> >     wogri
> >     --
> >     http://www.wogri.at
> >
> >     On 01 Jan 2014, at 21:55, Matt Rabbitt <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >     > I created a cluster, four monitors, and 12 OSDs using the
> >     ceph-deploy tool.  I initially created this cluster with one
> >     monitor, then added a "public network" statement in ceph.conf so
> >     that I could use ceph-deploy to add the other monitors.  When I run
> >     ceph -w now everything checks out and all monitors and OSDs show up
> >     and I can read and write data to my pool.  The problem is when I
> >     shut down the monitor that I initially used to configure the
> >     cluster, nothing works anymore.  If I run ceph -w all I get is fault
> >     errors about that first monitor being down, and I can't read or
> >     write data even though the other three monitors are still up.  What
> >     did I do wrong here?  I've been looking over the documentation and I
> >     see all kinds of info about having a mon addr attribute in my config
> >     or a public ip in the [mon] section but my config doesn't have
> >     anything like that in it.  Here is my complete config:
> >     >
> >     > [global]
> >     > fsid = a0ab5715-f9e6-4d71-8da6-0ad976ac350c
> >     > mon_initial_members = storage1
> >     > mon_host = 10.0.10.11
> >     > auth_supported = cephx
> >     > osd_journal_size = 6144
> >     > filestore_xattr_use_omap = true
> >     > public network = 10.0.10.0/24 <http://10.0.10.0/24>
> >     > _______________________________________________
> >     > ceph-users mailing list
> >     > [email protected] <mailto:[email protected]>
> >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
>
> --
> http://www.wogri.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to