On 04/09/2014 12:31 AM, Simo Sorce wrote:
On Tue, 2014-04-08 at 12:00 +0200, Ludwig Krispenz wrote:
Replication storms. In my opinion the replication of a mod of one or
two attribute in a entry will be faster than the bind itself.
Think about the amplification effect in an environment with 20 replicas.
1 login attempt -> 20+ replication messages

Now think about what happen bandwidth wise when a few thousand people
all authenticate at the same time across the infrastructure, you deploy
more servers to scale better and you get *more* traffic, at some point
servers actually get slower as they are busy with replication related

Think what happen if one of these servers is in a satellite office on a
relatively slow link and every morning it receives a flooding of
replication data ... that is 99% useless because most of tat data is not
relevant in that office.
ok, lets leave it with that, there might be scenarios where it becomes unacceptable and as long as we have an acceptable solution we need not enforce full replication

  If an attacker knows all the dns of the entries in a server the
denial of service could be that it just does a sequence of failed
logins for any user and nobody will be able to login any more,
This is perfectly true which is why we do not permanently lockout users
by default and which is why I personally dislike lockouts. A much better
mechanism to deal with brute force attacks is throttling, but it is also
somewhat harder to implement as you need to either have an async model
to delay answers or you need to tie threads for the delay time.
Still a far superior measure than replicating status around at all
yes, that could be a good solution, but not trivial

  replication would help to propagate this to other servers, but not
prevent it. This would also be the case if only the final lockout
state is replicated.
Yes but the amount of replicated information would be far less. With our
default 1/5th less on average as 5 is the number of failed attempts
before the final lockout kicks in. So you save a lot of bandwidth.

I like the idea of replicating the attributes changed at failed logins
(or reset) only.
I think this is reasonable indeed, the common case is that users tend to
get their password right, and if you are under a password guessing
attack you should stop it. The issue is though that sometimes you have
misconfigured services with bad keytabs that will try over and over
again to init, even if the account is locked, or maybe (even worse) they
try a number of bad keys, but lower than the failed count, before
getting to the right one (thus resetting the failed count). If they do
this often you can still self-DoS even without a malicious attacker :-/

Something like this is what we have experienced for real and cause us to
actually disable replication of all the lockout related attributes in
the past.
But also here it can get complicated, we cannot really use failedlogincount and replicate it, eg if it is "2" on each server an their are parallel login attempts, we would increment it to "3" and replicate, so we would have 3 on all servers, not what we wanted. We could replicate changes to lastfailedauth and when receiving an update for this attribute locally increase failedcount, but it would also have to be used for resets (deleting lastFailedAuth), but there could also be race conditions, maybe there are other local attrs needed.

And the bad news: I claimed that the replication protocol ensures that the last change wins except for bugs, and looks like we have one bug for single valued attributes in some scenarios. I have to repeat the test to double check. The update resolution code for single valued attrs is a nightmare, Rich and I several times said we need to rewrite it :-(

PS: Martin, if you are looking for subjects for a thesis, maybe some theoretical model for replication update resolution and what is required history could be a challenge.


Freeipa-devel mailing list

Reply via email to