On 24.2.2016 15:25, Simo Sorce wrote:
> On Wed, 2016-02-24 at 10:00 +0100, Martin Kosek wrote:
>> On 02/23/2016 06:59 PM, Petr Spacek wrote:
>>> On 23.2.2016 18:14, Simo Sorce wrote:
>> ...
>>>> More seriously I think it is a great idea, but too premature to get all
>>>> the way there now. We need to build schema and CLI that will allow us to
>>>> get there without having to completely change interfaces if at all
>>>> possible or minimizing any disruption in the tools.
>>>
>>> Actually the backwards compatibility is the main worry which led to this 
>>> idea
>>> with links.
>>>
>>> If we release first version of locations with custom priorities etc. we will
>>> have support the schema (which will be different) and API (which will be 
>>> later
>>> unnecessary) forever.
>>>
>>> If we skip this intermediate phase with hand-made configuration we can save
>>> all the headache with upgrades to more automatic solution later on.
>>>
>>>
>>> Maybe we should invert the order:
>>> Start with locations + links with administrative metric and add 
>>> hand-tweaking
>>> capabilities later (if necessary).
>>>
>>> IMHO locations + links with administrative metric will be easier to 
>>> implement
>>> than the first version.
>>>
>>> Just thinking aloud ...
>>
>> Makes sense to me, I would have the same worry as Petr, that we would break
>> something if we decide moving to links based solution later.
> 
> Maybe I am missing something, but in order to generate the proper SRV
> records we need priority and weights anyway, either by entering them
> manually or by autogenerating them from some other piece of information
> in the framework. So given this information is needed anyway why would
> it become a problem to retain it in the future if we enable a tool the
> simply autogenerates this information ?

Let me clarify this:
You are right, in the end we always somehow get to priorities and weights.

TL;DR version
=============
The difference is subtle details how we get priorities and if we store them in
LDAP and represent them in API (or not). It will simplify things if we do not
expose them. I'm not convinced that we *need* to expose them in the first round.


TL version
==========

In the high level the process is always as follows:
1. input tuples (location, server, weight) for all primary servers assigned to
locations
2. input or derive (location, server, priority) for all backups
3. generate SRV records using priority groups combined from the previous two 
steps

Now we are trying to decide if step (2) is "input" OR "derive" priorities for
backup servers.


Variants
~~~~~~~~

Variant A
---------
If we let the user to do everything manually (no links etc.) we need to
provide following schema + API + user interface:
[first step - same in both variants]
* create locations
* assign 'main' (aka 'primary' aka 'home') servers to locations
++ specify weights for the 'main' servers in given location, i.e. manually
input (server, weight) tuples

[second step]
* specify backup servers for each location
++ assign (server, priority, weight) information for each non-main server
++ for S servers and L locations we need to represent up to
   S * L tuples (server, priority, weight) and provide means to manage it
++ most importantly, maintenance complexity of backups grows any time you add
one of (server OR location)
++ this would be a nightmare to manage. For simple cases this require some
'include' mechanism to declare one location as backup for another location.
This include complicates things significantly as it has a lot of corner cases
and requires different LDAP schema when compared to direct servers assignment.



Variant B
---------
If we let the user only specify locations + links with costs we need to
provide following schema + API + user interface:
[first step - no change from variant A]
* create locations
* assign 'main' (aka 'primary' aka 'home') servers to locations
++ specify weights for the 'main' servers in given location, i.e. manually
input (server, weight) tuples

[second step]
* create links between locations
++ manually assign point-to-point information + administrative cost
++ for S servers and L locations we need to represent up to
   L^2 tuples (from, to, cost) and provide means to manage it
++ storage can be optimized to great extent if there is a lot of links with
equal cost, typically a full-mesh interconnections can be represented by
single object in LDAP
* generate backups (i.e. priority assignment) using usual routing algorithms.
Priority does not need to be neither exposed to user nor stored in LDAP at all.
++ most importantly, maintenance complexity of backups grows while you add
locations *but* you do not need to manually go though backup configuration for
(potentially) all locations every time as you add/change/remove servers in
existing locations (which you have to do with variant A, unless you use some
smart includes ...).


Please note that variant B with (links, costs) do not use explicit priority
specification at all as this is always calculated by a algorithm.

If we ever decide to provide means to hand-tweak generated priorities, we can
still invent LDAP schema and API for variant A and populate it with data
generated by the variant B algorithm, but we do not need to do that today.

Less schema and less API -> smaller maintenance costs.

Does it clarify why (link, cost) model is easier to manage for end-user than
(server, priority, weight)?



Variant C
---------
An alternative is to be lazy and dumb. Maybe it would be enough for the first
round ...

We would retain
[first step - no change from variant A]
* create locations
* assign 'main' (aka 'primary' aka 'home') servers to locations
++ specify weights for the 'main' servers in given location, i.e. manually
input (server, weight) tuples

Then, backups would be auto-generated set of all remaining servers from all
other locations.

Additional storage complexity: 0

This covers the scenario "always prefer local servers and use remote only as
fallback" easily. It does not cover any other scenario.

This might be sufficient for the first run and would allow us to gather some
feedback from the field.

Now I'm inclined to this variant :-)




Bonus
=====
Variant B with links has some fancy properties, here are some for curious:

* Speaking of storage, there is an interesting consequence:
Assumption (S = number of servers) >= (L = number of locations)
Variant A complexity: S * L
Variant B complexity: L * L
=> S * L >= L * L
=> variant A complexity >= variant B complexity

This holds for the usual cases where all servers within one location have the
same priority.

* Other cases can be represented in variant B using new location and
appropriate link costs. Variant B requires splitting hot backups to separate
location like "CZ-hot-backup", which is then easy to display in topology graph
etc.

* Coincidentally, variant B allows to do fancy things like empty locations
which are used only for routing. This nicely describe situation where all
branch offices have own local servers and are connected to a VPN concentrator
somewhere in the middle of a continent.

E.g. declare 'hub' location with no IPA servers in it. Then create links
(branch, hub, cost) for each branch office.

This trivial configuration would automatically allow to compute backups for
branch1 in optimal way, where clients form branch1 prefer branch2 over branch3
because branch3 has crappy VPN link.


If you reached this point while not skipping anything you deserve some reward
points, let me know :-D

-- 
Petr^2 Spacek

-- 
Manage your subscription for the Freeipa-devel mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-devel
Contribute to FreeIPA: http://www.freeipa.org/page/Contribute/Code

Reply via email to