John,

Aha, thanks for that -- that got me closer to the problem.

I forgot an important detail: A few days before the upgrade, I set the cluster 
and public networks in the config files on the nodes to the "back-end" network, 
which the MON nodes don't have access to.  I suspected that this was a bad idea 
at the time, but since it didn't break anything (we are still in test mode on 
this cluster, so downtime is completely fine), I figured it somehow didn’t 
matter.  I must have forgotten to restart the ceph service on the MONs so the 
symptom didn't appear until the ceph upgrade.

I just switched the public network back to the "front-end network", which the 
MONs do have access to, and now the ceph_rest_api runs fine (and your "tell 
osd.0 version" does as well).  So that problem's solved.

But now we're back to the original problem, which is why I was monkeying with 
the "public network" config entry to begin with.  Let me explain:

As I said, we have two separate networks:

10.197.5.0/24 - The "front-end" network, "skinny pipe", all 1Gbe, intended to 
be a management or control plane network
10.174.1.0/24 - The "back-end" network, "fat pipe", all OSD nodes use 2x bonded 
10Gbe, intended to be a data network

So we want all of the OSD traffic to go over the "back end", and the MON 
traffic to go over the "front end".  We thought the following would do that:

public network = 10.197.5.0/24   # skinny pipe, mgmt & MON traffic
cluster network = 10.174.1.0/24  # fat pipe, OSD traffic

But that doesn't seem to be the case -- iftop and netstat show that little/no 
OSD communication is happening over the 10.174.1 network and it's all happening 
over the 10.197.5 network.

What configuration should we be running to enforce the networks per our design? 
 Thanks!

Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net

** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. **
-----Original Message-----
From: John Spray [mailto:jsp...@redhat.com] 
Sent: Thursday, October 22, 2015 12:48 PM
To: Jon Heese <jhe...@inetu.net>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Problems with ceph_rest_api after update

On Thu, Oct 22, 2015 at 3:36 PM, Jon Heese <jhe...@inetu.net> wrote:
> Hello,
>
>
>
> We are running a Ceph cluster with 3x CentOS 7 MON nodes, and after we 
> updated the ceph packages on the MONs yesterday (from 0.94.3 to 
> 0.94.4), the ceph_rest_api started refusing to run, giving the 
> following error 30 seconds after it’s started:

Weird.  Does this work?
"ceph --id admin tell osd.0 version"

get_command_descriptions is ceph_rest_api's way of asking an OSD to tell it 
what operations are supported.  It's sent from ceph_rest_api to an OSD the same 
way a 'tell' command is sent from the CLI (although you can't actually issue 
get_command_descriptions with the CLI).

ceph_rest_api is picking the last up OSD it can see, as an arbitrary place to 
send the query, so if you have for example an up OSD that isn't really 
responsive, it could cause a problem.

John

>
>
>
> [root@ceph-mon01 ~]# /usr/bin/ceph-rest-api -c /etc/ceph/ceph.conf 
> --cluster ceph -i admin
>
> Traceback (most recent call last):
>
>   File "/usr/bin/ceph-rest-api", line 59, in <module>
>
>     rest,
>
>   File "/usr/lib/python2.7/site-packages/ceph_rest_api.py", line 503, 
> in generate_app
>
>     addr, port = api_setup(app, conf, cluster, clientname, clientid, 
> args)
>
>   File "/usr/lib/python2.7/site-packages/ceph_rest_api.py", line 145, 
> in api_setup
>
>     target=('osd', int(osdid)))
>
>   File "/usr/lib/python2.7/site-packages/ceph_rest_api.py", line 83, 
> in get_command_descriptions
>
>     raise EnvironmentError(ret, err)
>
> EnvironmentError: [Errno -4] Can't get command descriptions:
>
>
>
> Nothing else was changed, only the packages were updated.  I’ve looked 
> at the python, and it seems to be timing out waiting for this line to 
> complete, but I’m not sure where to look next in terms of what 
> “get_command_descriptions” actually does:
>
>
>
> ret, outbuf, outs = json_command(cluster, target,
>
>                                          
> prefix='get_command_descriptions',
>
>                                          timeout=30)
>
>
>
> Is this a known issue?  If not, does anyone have any suggestions of 
> how to further troubleshoot this further?  Thanks in advance.
>
>
>
> Jon Heese
> Systems Engineer
> INetU Managed Hosting
> P: 610.266.7441 x 261
> F: 610.266.7434
> www.inetu.net
>
> ** This message contains confidential information, which also may be 
> privileged, and is intended only for the person(s) addressed above. 
> Any unauthorized use, distribution, copying or disclosure of 
> confidential and/or privileged information is strictly prohibited. If 
> you have received this communication in error, please erase all copies 
> of the message and its attachments and notify the sender immediately 
> via reply e-mail. **
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to