On 17.05.2016 14:11, Vincent Veyron wrote:
On Tue, 17 May 2016 10:16:43 +0200
André Warnier <a...@ice-sa.com> wrote:
I don't see above any signifiant difference in configuration between the
servers, apart
from the fact that the "faulty" server runs a 64-bit version of perl.
Sorry : slightly digressive rant about the fact that every time I compare my
configs, I find some subtle differences. Should be getting into config
management tools, but that takes time too.
Now I also found this :
http://rabexc.org/posts/randomizing-should-be-easy-right-oh
I am not sure that I really understand this all the way down, but would this
not be a
suspect in a case where the behaviour seems different between one 64-bit
machine, and a
bunch of 32-bit ones ?
Nope; same results on both types when running the script
This being said, it still looks to me as if the current code is flawed on *all*
machines,
and *will* repeat keys quite often. It just depends again on the exact sequence
of
requests hitting a specific Apache, and the other parameters I mentioned before.
I still believe that the fact that it does not *seem* to happen, is just due to
the
inherent randomness of these other factors on the production machines.
Well, I already posted a test with ab and 12 000 requests, so not sure about
the 'quite often' part?
This is on the faulty one :
xxxx@arsene:~$ perl -le '%h=();for (1..10_000_000) {my $session_id = join "", map
+(0..9,"a".."z","A".."Z")[rand(10+26*2)], 1..32;$h{$session_id}=1};$v=keys %h; print $v'
10000000
Yes, but this is *one* process. Each independent process, if you consider the keys, will
get a succession of different answers from rand(), and thus generate different keys.
But if n different processes were all starting with the same initial seed, they would all
generate the *same* sequence of rand() responses, and the same sequence of keys.
And that is what I am saying : each of your Apache pre-fork children is a separate
process, but they all always start with the same random seed.
So they will all, ultimately, generate the same sequence of keys (but not necessarily at
the same time).
Let's say that there are initially 5 Apache children, and that Apache never starts more
than 5.
Now you start bombarding the server with hundreds of requests, all of them triggering the
key-generation mechanism.
And let's say that it takes your module 1 s. to respond to a request (just to make things
simpler below).
T0 :
Request #1 comes in.
The main Apache looks for a free child, and finds child #1.
It passes request #1 to child #1.
This child will be busy until T0 + 1s.
T0 + 0.1s :
Request #2 comes in.
The main Apache looks for a free child.
Child #1 is still busy, so it finds child #2.
It passes request #2 to child #2.
This child will be busy until T0 + 2s.
and so on..
(child 5 is now busy until T0 + 5 s.)
Request #6, at T0 + 0.6s) :
Now all 5 children are busy, and Apache has to wait with the request, until
one child becomes free (*).
In this very simplified case, it will be child #1, at T0 + 1s.
At T0 + 1s, child #1 becomes free again. So child #1 now gets request #6, which for him is
only the *second* request that it processes.
So it generates *its* key #2 (which globally is the generated key #6).
In this very simplified example, the first 5 keys generated globally by Apache will be
identical, because each child starts with the same seed, and they are all called neatly in
a regular sequence.
And then the next 5 keys will be identical, because for each child it is now the second
request.
And so on.
But in a real situation :
- not all requests come in so neatly at regular intervals
(so for example child #1 may become free, before child #5 is even called once).
- not all requests take the same time to serve (other things happen on the
server)
- not all requests generate a key (so if child #4 is called but does not call rand(), it
does not count; or if it calls rand() only 5 times instead of 32, that screws up the whole
sequence, and it will now start generating keys that are different from all the others)
- the number of children will vary over time. New ones will be created as
needed,
some older ones will die and be replaced by a brand-new one. Each time that happens, the
new child will start with key #1 again, because it jus got a brand-new perl. While at the
same time, there may still be an older child alive, for which the next key is already
number 5000 in its own sequence.
- etc..
And this "disorder" will tend to be larger, the more loaded is that server.
So over any given period of time, each child will tend to be at a different stage in his
rand() calls. And the risk of having the same key being returned to 2 clients at about the
same time, is relatively low.
But if the keys are stored somewhere in a persistent way, you are increasing the risk
greatly, because key #13 generated by a new child today, may conflict with the key #13
generated by another child yesterday.
(*) or start an additional child