Issue 742 in ganeti: gnt-cluster verify reports ERROR:ENODEHOOKS when a node is offline

ganeti Mon, 03 Mar 2014 04:00:24 -0800

Status: New
Owner: ----

New issue 742 by [email protected]: gnt-cluster verify reportsERROR:ENODEHOOKS when a node is offline

http://code.google.com/p/ganeti/issues/detail?id=742


What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".

# gnt-cluster --version; gnt-cluster version; hspace --version
gnt-cluster (ganeti v2.10.0) 2.10.0
Software version: 2.10.0
Internode protocol: 2100000
Configuration format: 2100000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.10.0
hspace (ganeti) version v2.10.0
compiled with ghc 6.12
running on linux x86_64

What distribution are you using?

Debian squeeze.

What steps will reproduce the problem?
1. gnt-node modify -O yes $node

2. Ensure node is actually unreachable (ssh $node service ganeti stopganeti-noded is enough)

3. gnt-cluster verify --error-codes.

What is the expected output? What do you see instead?

Expected output: gnt-cluster verify passes with no errors related to thenode being offline.

Actual output:
Mon Mar  3 11:45:28 2014 * Other Notes
Mon Mar  3 11:45:28 2014   - NOTICE: 1 offline node(s) found.

Mon Mar 3 11:45:32 2014 - WARNING: Communication failure to node$OFFLINENODE: Error 7: Failed connect to $OFFLINENODE:1811; Resourcetemporarily unavailable

Mon Mar  3 11:45:32 2014 * Hooks Results

Mon Mar 3 11:45:32 2014 -ERROR:ENODEHOOKS:node:$OFFLINENODE:Communication failure in hooksexecution: Error 7: Failed connect to $OFFLINENODE:1811; Resourcetemporarily unavailable


Please provide any additional information below.

I tried to track down the problem in Ganeti to make sure it wasn't aproblem with our local setup. The cause seems to be that _CheckConfigNodein lib/rpc.py does not do offline detection for nodes addressed by theirhostname instead of UUID, and it appears that all job hooks address nodesby their hostnames (through the to_name lambda in HooksMaster.BuildFromLuin lib/hooksmaster.py).

I've attached a diff containing a dirty workaround for the problem, whichfixes the immediate issue. I'm not sure if the proper fix should be toswitch from names to UUIDs for job hooks or changing the config API to beable to look up by name again (or something else entirely) so I'll leavethat to you :).


Attachments:
        gnt-cluster_verify_hacky_workaround.patch  847 bytes

--

You received this message because this project is configured to send allissue notifications to this address.

You may adjust your notification preferences at:
https://code.google.com/hosting/settings

Issue 742 in ganeti: gnt-cluster verify reports ERROR:ENODEHOOKS when a node is offline

Reply via email to