Re: [PROPOSAL] Separate management addresses from the concept of an entity's public address

Sam Corbett Thu, 08 Dec 2016 11:06:07 -0800

Thanks for your thoughts Alex.

The first half more or less makes sense. It would be helpful to me ifyou clarified a couple of points.

I don't understand what you mean when you write that`management.host.address` assumes a general management.ssh.endpoint andthat this is something we should not do. Why should we not assume this,and (ignoring naming and my disregard of ports) how does`brooklyn.ssh.endpoint` differ conceptually?

I also didn't follow where the name "firewall1" comes from, or how anenricher on a location works.

I prefer the idea of publishing "management.network" and"management.endpoint" to "management.ssh.network" or"brooklyn.ssh.network". I don't see how you would use the latter formfurther down the line, for example when configuring entity sensor feeds.Maybe this one doesn't matter so much; the proposal you linked says:


    To summarise, sensors can be qualified — e.g.  with suffix .public — to
    indicate a specific interface.  All unqualified sensors including
    host.address and main.uri now refer to a “default subnet”, typically
    (nowadays) not the public internet.

I don't think your comments in this thread touch upon this idea but forsimplicity's sake right now I prefer having things continue to usehost.address etc. than thinking in terms of services.

I'm in rough agreement with the second half of your email, beginning"Other points to address," but I think a location customiser is thewrong place to do this. We could make it work but it would be a misuseof an existing concept. I would prefer to create a separate class.


Things I intend to do then:

* test credentials when determining reachability.

* write LocationNetworkInfoCustomizer (or whatever we choose to callit), reusing much of the existing code in JcloudsLocation. Its job is to:

- expose a way of indicating that public or private addresses should bepreferred and default it to private, per the portion of the proposal Imentioned.

- choose a reachable endpoint for management

- set `host.address` and `host.name` on an entity to this endpoint(removing the responsibility from MachineLifecycleEffectorTasks)- publish `host.address.network` and `host.name.network` sensors onentities for each public and private network jclouds tells us about.

* not change how feeds choose which address to poll or howBrooklynAccessUtils works (i.e. they should continue to usehost.address). We can think about management services later.


I'll try to tidy JcloudsLocation up while I go.

Any further comments?

Sam


On 07/12/2016 15:44, Alex Heneveld wrote:

Hi Sam,

How does this relate to the strategy suggested in the Networking Proposal
[1] ?

TL;DR I agree with the intention but think some tweaks to the mechanism
would make for even better clarify and consistency going forwarD


The proposal [1] suggests several things.  One is a format for network
addresses:   *host.address.network*
For instance:

     host.address.public: 123.0.0.45
     host.address.private1: 10.0.0.45
     host.address.private2: 192.168.0.45

Another is a format for services:  *service.port* . (and *url* and others)
I've added a suggestion to use the convention . *service.network* . to be
able to indicate a specific one of the networks above.
Where required, policies can open up additional ingress or port forwards,
of the form  *service.field.mapped.network* . e.g. *http.port.mapped.public*
.

The format suggested in Sam's proposal -- *management.host.address* --
feels inconsistent with the above, and it neglects the need to indicate the
port (e.g. if port forwarding is needed) and bearer (if it's not
straightforward ssh).  I think of management access on a particular
protocol as a *service* -- eg management or management.ssh or *brooklyn.ssh*.
So the proposal would instead say:

     brooklyn.ssh.network:  private1
     brooklyn.ssh.port:  22

With enrichers then able to create eg  *brooklyn.ssh.endpoint: 10.0.0.45:22
<http://10.0.0.45:22>*

This is saying "the service this entity exposes for Brooklyn to SSH in is
10.0.0.45:22".  It isn't assuming a general management.ssh endpoint (you
could, as a different service, but we shouldn't assume that's always the
case), nor is it assuming a dedicated management network, although again it
could support it (eg replace "private1" with "management" in the above
network name).

This would let us be consistent with other port-forwarding / enriching
strategies as well as naming conventions.  For instance if Brooklyn needs
to ssh through port-forwarding (PFW) with say a rule created on "firewall1"
at 10.0.0.1:10001 CIDR'd to Brooklyn we might start with:

     ssh.network:  private1
     ssh.port:  22

and

     host.address.firewall1: 10.0.0.1

enrichers would create this:

     ssh.endpoint:  10.0.0.45:22

but brooklyn wouldn't be able to access it ... instead a PFW customiser
would set up the following:

     ssh.port.mapped.firewall1:  10001
     ssh.endpoint.mapped.firewall1:  10.0.0.1:10001

and then brooklyn/location setup would create the following sensors from
the above:

     brooklyn.ssh.network:  firewall1
     brooklyn.ssh.port:  10001
*    brooklyn.ssh.endpoint:  10.0.0.1:10001 <http://10.0.0.1:10001>*

The bold line above is what Brooklyn will use to make ssh connections --
completely unambiguous and it can be populated through different
strategies.  (And if it isn't direct ssh but instead
ssh-via-an-intermediate-machine, or https, or some other ssh-encoding
strategy we could also have  "brooklyn.ssh.bearer:  my-https" ... not
immediately relevant here but important to keep in mind that not all SSH
commands are sent via a straightforward ssh.)


Other points to address, some of which you touch on, are:

(a) which network is deemed the "default" (used to populate host.address)
(b) which network Brooklyn uses to connect for ssh purposes
(c) how are networks named (eg "public", "private1", "private2")
(d) how do we infer the host name
(e) which network Brooklyn uses to connect for other monitoring/control
purposes (http etc)

Currently as you note, Brooklyn when creating a JcloudsSshMachineLocation
tries to connect to sockets on the reported public addresses and then on
the reported private addresses, and decides that the first one which
listens on port 22 is to be used for (a) and (b).  It neglects (c)
altogether and publishes only `host.address` (and `host,name`).

Whilst it is a big ask to make the perfect strategies for all this, I think
a big improvement would be to permit this behaviour to be customised.  I
suggest we provide write a `LocationNetworkInfoCustomizer` instance
(implementing `LocationCustomizer`) to perform (a)-(d), together with a
`networkInfoCustomizer` config key to load it (so that we don't disrupt
other uses of LocationCustomizers).  That class could be the default, but
it could take some additional customization (eg to define specific
strategies for (b)) and of course a developer could subclass it; this
allows behaviour to be overridden in either the location's definition or in
an entity's provisioning properties.

The initial behaviour of such an instance could be as follows:

* attempt to find ports 22 to which brooklyn is able to successfully log in
via ssh (not just reach, as you noted Svet is going to fix ... I've had
bizarre problems in hotels where Brooklyn won't connect because an
inaccessible 10.x.x.x address of the machine is an address of a machine on
the hotel wifi!)
   * preferring private addresses
   * but configurable something like "preferPublic" "preferPrivate"
"allowPublic" etc ... and/or a CIDR preference order
* give names "public1/2/3" and "private1/2/3" to those networks reported by
jclouds (in future we could support CIDR constraints or subclasses could
connect to the machine and see which nics they correspond to)
* do simple things for now like make the network brooklyn uses the default
and use the current strategies for hostname, but again these could be
extended in the future
* publish the sensors above

As for (e) I suggest a similar pattern to explicitly identify the
brooklyn-accessible endpoints for other services that brooklyn needs access
to, eg creating `brooklyn.http.url` from a `http.port` and `http.url` or
`http.network`, possibly through an intermediate `http.url.mapped.jumphost`.

This should let us solve a lot of issues, not just management network
conflation but also public/private issues we have when configuring clusters
of nodes who all need to talk to each other.

Best
Alex


[1]
https://docs.google.com/document/d/1IrWLWunWSl_ScwY3MRICped8eJMjQEH1FbWZJcoK0Iw

On 7 December 2016 at 11:52, Geoff Macartney <
[email protected]> wrote:

+1  This sounds like a good idea.  In most customer deployments I've seen
in "previous lives" there has been a separate management network for
production deployments, it would be good for Brooklyn to have this as an
explicit concept factored out from 'host.address'.

Geoff



On Wed, 7 Dec 2016 at 11:31 Richard Downer <[email protected]> wrote:

+1

The host.address and host.subnet.address has always been confusing, at
least for me, and especially so for figuring out what SshMachineLocation

is

going to do. Having a dedicated sensor for the management address with
unambiguous purpose for SshMachineLocation (and others!) seems obvious to
me!

Thanks
Richard.


On 6 December 2016 at 16:26, Sam Corbett <[email protected]>
wrote:

Summary:

Brooklyn conflates the management address of an entity with its public
address. I want to break this by publishing a new sensor called
management.host.address on entities that use JcloudsLocation and using

it

in preference to host.address when creating SSH connections and polling
feeds. Its value is the entity's host.subnet.address if that is

accessible,

otherwise host.address.

Some background:

JcloudsLocation makes a guess at the best host and port to use for SSH
connections to each instance. The value it chooses subsequently

informs a

variety of entity sensors, most importantly the "host.address" sensor

which

itself informs the address at which various feeds are polled and other
sensors like "main.uri". Right now Brooklyn always prefers a value from

the

set of "public" addresses returned by the cloud. Internally to
JcloudsLocation the value that is picked is referred to as the

"management

host and port", but by subsequently using it for host.address we

conflate

it with the publicly accessible address.

An obvious change to make to this is to have Brooklyn use the private
address for SSH connections when it's on the same subnet as the thing

it

has provisioned. In order to do this without mucking up all of the

existing

assumptions I propose we introduce a new sensor called
"management.host.address", whose value is a reachable private address,

if

one exists, and otherwise the value chosen for the public address. When
creating SSH connections SshMachineLocation would check for the

management

address and fall back to its current behaviour if it's unset.

An alternative is to have SshMachineLocation itself work out whether it
can connect to a private address. I do not like this option because we
would be unable to reuse the information that a private address is

better

in entity feeds and BrooklynAccessUtils.

Svet raised the excellent point that we risk having an instance's

private

address match an irrelevant machine on the same network as Brooklyn. To
resolve this he suggests that we change the check for reachability to

also

test credentials rather than simply trying to open a socket as happens

at

the moment.

I'm going to start on an implementation of this. Any feedback or

questions?

Sam

Re: [PROPOSAL] Separate management addresses from the concept of an entity's public address

Reply via email to