Tuong Truong created AMBARI-11854:
-------------------------------------
Summary: ambari-agent fails to start when node has multiple
network cards with some does not have IP address
Key: AMBARI-11854
URL: https://issues.apache.org/jira/browse/AMBARI-11854
Project: Ambari
Issue Type: Bug
Components: ambari-agent
Affects Versions: 2.1.0.
Environment: AMD
Reporter: Tuong Truong
Fix For: 2.1.0.
In a cluster with nodes that has multiple network interfaces.. Ambari-agent
fails to start due to one or more active network interface did not bind to an
IP address.
The /var/log/ambari-agent/ambari-agent.out shows
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 354,
in run
self.register = Register(self.config)
File "/usr/lib/python2.6/site-packages/ambari_agent/Register.py", line 34, in
__init__
self.hardware = Hardware()
File "/usr/lib/python2.6/site-packages/ambari_agent/Hardware.py", line 41, in
__init__
self.hardware.update(Facter().facterInfo())
File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 466, in
facterInfo
facterInfo = super(FacterLinux, self).facterInfo()
File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 161, in
facterInfo
facterInfo['netmask'] = self.getNetmask()
File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 384, in
getNetmask
if primary_ip == self.get_ip_address_by_ifname(i.strip()).strip():
File "/usr/lib/python2.6/site-packages/ambari_agent/Facter.py", line 397, in
get_ip_address_by_ifname
struct.pack('256s', ifname[:15])
IOError: [Errno 99] Cannot assign requested address
Ran command manually on the nodes that failed to register 'python
/usr/lib/python2.6/site-packages/ambari_agent/Facter.py' and got the same
response.
When we ran it on nodes where the registration was successful I get a json
response like
{'kernel': 'Linux', 'domain': 'svl.ibm.com', 'kernelrelease':
'2.6.32-504.el6.x86_64', 'uptime_days': '0', 'memorytotal': 49413988,
'swapfree': '8.00 GB', 'processorcount': 24, 'selinux': False, 'timezone':
'PST', 'hardwareisa': 'x86_64', 'operatingsystem': 'redhat', 'hostname':
'hdperf014', 'id': 'root', 'memoryfree': 48185456, 'hardwaremodel': 'x86_64',
'uptime_seconds': '11578', 'osfamily': 'redhat', 'memorysize': 49413988,
'interfaces': 'eth0,lo', 'physicalprocessorcount': 24, 'swapsize': '8.00 GB',
'netmask': '255.255.255.0', 'ipaddress': '9.30.75.23', 'kernelmajversion':
'2.6', 'kernelversion': '2.6.32', 'macaddress': '00:02:C9:4B:57:62',
'operatingsystemrelease': '6.6', 'uptime_hours': '3', 'fqdn':
'hdperf014.svl.ibm.com', 'architecture': 'x86_64'}
rroot@xxxxx ambari-agent]# ifconfig
eth0 Link encap:Ethernet HWaddr 5C:F3:FC:A6:48:B4
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Memory:93360000-9337ffff
eth2 Link encap:Ethernet HWaddr 00:02:C9:4B:57:CE
inet addr:9.30.75.21 Bcast:9.30.75.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:48830 errors:0 dropped:0 overruns:0 frame:0
TX packets:25329 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:2000
RX bytes:64833325 (61.8 MiB) TX bytes:2582433 (2.4 MiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:21 errors:0 dropped:0 overruns:0 frame:0
TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1560 (1.5 KiB) TX bytes:1560 (1.5 KiB)
workaround is to deactivate the network interface: :ifconfig eth0 down
If config now sees
[root@hdperf012 ambari_agent]# ifconfig
eth2 Link encap:Ethernet HWaddr 00:02:C9:4B:57:CE
inet addr:9.30.75.21 Bcast:9.30.75.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:49006 errors:0 dropped:0 overruns:0 frame:0
TX packets:25420 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:2000
RX bytes:64847473 (61.8 MiB) TX bytes:2593953 (2.4 MiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:21 errors:0 dropped:0 overruns:0 frame:0
TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1560 (1.5 KiB) TX bytes:1560 (1.5 KiB)
ambari-agent comes up afterward.
Same machine did not hit the problem in prior Ambari build.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)