Andy Kurth created VCL-839:
------------------------------
Summary: Problems occur when "localhost" is used for a management
node name
Key: VCL-839
URL: https://issues.apache.org/jira/browse/VCL-839
Project: VCL
Issue Type: Bug
Components: vcld (backend)
Affects Versions: 2.4
Reporter: Andy Kurth
Fix For: 2.4.1
The {{vcl-install.sh}} script uses _localhost_ as the name of the management
node by default. This _FQDN_ parameter in {{/etc/vcl/vcld.conf}} gets set to
_localhost_ as well as the _managementnode.hostname_ value.
The backend code needs to determine the private IP address being used on the
management node. This is not stored in the database. Only the management
node's hostname and an ambiguous _IPaddress_ values are stored in the
management node table. The _IPaddress_ value should be set to the public IP
address in order to allow management nodes which don't share the same private
network to communicate.
To determine its own private IP address, the management node attempts to
resolve its hostname, _localhost_, which resolves to 127.0.0.1. After this
step, the code compares the resolved IP address to the addresses assigned to
the management node's interfaces. The loopback interface's IP addresses are
explicitly excluded because there would be no reason for the code to ever use a
loopback address.
This introduces the first problem, which is mostly cosmetic at this point. The
following warning is generated:
{noformat}
|30351|3|3|new|OS.pm:get_private_interface_name|1451| ---- WARNING ----
|30351|3|3|new|OS.pm:get_private_interface_name|1451| 2015-03-18
14:17:32|30351|3|3|new|OS.pm:get_private_interface_name|1451|failed to
determine private interface name, no interface is assigned the private IP
address for the reservation: 127.0.0.1
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : {
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "eth0" => {
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "broadcast_address"
=> "10.x.x.x",
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "ip_address" => {
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "10.x.x.x" =>
"255.255.240.0"
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : },
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "name" => "eth0",
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "physical_address"
=> "00:50:56:23:00:bc"
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : },
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "eth1" => {
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "broadcast_address"
=> "x.x.x.x",
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "default_gateway"
=> "x.x.x.x",
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "ip_address" => {
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "152.46.18.135"
=> "255.255.248.0"
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : },
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "name" => "eth1",
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "physical_address"
=> "00:50:56:23:00:bd"
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : },
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "lo" => {
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "ip_address" => {},
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : "name" => "lo"
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : }
|30351|3|3|new|OS.pm:get_private_interface_name|1451| : }
|30351|3|3|new|OS.pm:get_private_interface_name|1451| ( 0) OS.pm,
get_private_interface_name (line: 1451)
|30351|3|3|new|OS.pm:get_private_interface_name|1451| (-1) OS.pm,
get_private_network_configuration (line: 1695)
|30351|3|3|new|OS.pm:get_private_interface_name|1451| (-2) (eval 762), (eval)
(line: 1)
|30351|3|3|new|OS.pm:get_private_interface_name|1451| (-3) OS.pm,
get_ip_address (line: 1846)
|30351|3|3|new|OS.pm:get_private_interface_name|1451| (-4) OS.pm,
get_private_ip_address (line: 1901)
|30351|3|3|new|OS.pm:get_private_interface_name|1451| (-5) Linux.pm, post_load
(line: 418)
{noformat}
The next problem occurs when a computer is being loaded. Linux.pm's post_load
subroutine attempts to add firewall rules to allow traffic to any port and and
specifically to port 22 from the management node's private IP address. This
isn't working as expected because the private IP address could not be
determined. The result is the attempt to allow traffic to any port from the
management node's private IP address is skipped:
{noformat}
|30351|3|3|new|Linux.pm:enable_firewall_port|3655| ---- WARNING ----
|30351|3|3|new|Linux.pm:enable_firewall_port|3655| 2015-03-18
14:22:44|30351|3|3|new|Linux.pm:enable_firewall_port|3655|firewall not
modified, port argument is not restricted to a certain port: 'any', scope
argument was not sup
plied, it must be restricted to certain IP addresses if the port argument is
unrestricted
{noformat}
The attempt to allow traffic to port 22 is completed. However, because no IP
address was specified traffic is allowed from any address. At this point, the
management node can still control the computer.
After the computer is reserved and the user connects, the code attempts to lock
down the firewall to the user's remote IP address. Existing firewall rules for
the specific connect method port are replaced when a user initially connects:
{noformat}
2015-03-18
14:23:35|31054|3|3|reserved|Linux.pm:enable_firewall_port|3734|overwrite
existing argument specified, existing tcp/22 firewall rule(s) will be replaced:
|31054|3|3|reserved|Linux.pm:enable_firewall_port|3734| existing scope:
0.0.0.0/0.0.0.0
|31054|3|3|reserved|Linux.pm:enable_firewall_port|3734| new scope:
y.y.y.y/255.255.255.0
{noformat}
_y.y.y.y is the user's remote IP address in this example_
Once the firewall is modified, the managment loses control of the computer
because the only existing rule which allowed access, 22 from any IP address,
was removed. All commands after this point fail.
{noformat}
2015-03-18 14:23:35|31054|3|3|reserved|utils.pm:run_ssh_command|4181|executing
SSH command on 192.168.2.1 (vm241-1): '/sbin/iptables-save >
/etc/sysconfig/iptables'
|31054|3|3|reserved|utils.pm:run_ssh_command|4291| ---- WARNING ----
|31054|3|3|reserved|utils.pm:run_ssh_command|4291| 2015-03-18
14:23:35|31054|3|3|reserved|utils.pm:run_ssh_command|4291|attempt 1/3: failed
to execute SSH command on 192.168.2.1 (vm241-1): '/sbin/iptables-save >
/etc/sysconfig/iptables', exit status: 255, output:
|31054|3|3|reserved|utils.pm:run_ssh_command|4291| ssh output (/sbin/ipta...):
ssh: connect to host 192.168.2.1 port 22: No route to host
{noformat}
The user isn't affected at this point. Traffic is still allowed from his/her
remote IP address. The management node will continue to check for a user
connection every few minutes. It continues to fail to do so. The reservation
is not timed out when a management node has no control over the computer.
Everything is fine for the user as long as he/she does not change location. If
they do so and click the _Connect_ button from another remote IP address, the
management node won't be able to open the firewall to the new address and the
user will not be able to connect.
User initiated image captures will also fail:
{noformat}
|6680|3|3|image|OS.pm:pre_capture|102| ---- WARNING ----
|6680|3|3|image|OS.pm:pre_capture|102| 2015-03-18
14:31:22|6680|3|3|image|OS.pm:pre_capture|102|unable to complete capture
preparation tasks, vm241-1 is powered on but not responding to SSH
|6680|3|3|image|OS.pm:pre_capture|102| ( 0) OS.pm, pre_capture (line: 102)
|6680|3|3|image|OS.pm:pre_capture|102| (-1) Linux.pm, pre_capture (line: 331)
|6680|3|3|image|OS.pm:pre_capture|102| (-2) VMware.pm, capture (line: 752)
|6680|3|3|image|OS.pm:pre_capture|102| (-3) image.pm, process (line: 179)
|6680|3|3|image|OS.pm:pre_capture|102| (-4) vcld, make_new_child (line: 587)
|6680|3|3|image|OS.pm:pre_capture|102| (-5) vcld, main (line: 348)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)