On 17.05.2016 14:11, Vincent Veyron wrote:
On Tue, 17 May 2016 10:16:43 +0200
André Warnier <a...@ice-sa.com> wrote:

I don't see above any signifiant difference in configuration between the 
servers, apart
from the fact that the "faulty" server runs a 64-bit version of perl.

Sorry : slightly digressive rant about the fact that every time I compare my 
configs, I find some subtle differences. Should be getting into config 
management tools, but that takes time too.


Now I also found this :
    http://rabexc.org/posts/randomizing-should-be-easy-right-oh

I am not sure that I really understand this all the way down, but would this 
not be a
suspect in a case where the behaviour seems different between one 64-bit 
machine, and a
bunch of 32-bit ones ?

Nope; same results on both types when running the script


This being said, it still looks to me as if the current code is flawed on *all* 
machines,
and *will* repeat keys quite often. It just depends again on the exact sequence 
of
requests hitting a specific Apache, and the other parameters I mentioned before.
I still believe that the fact that it does not *seem* to happen, is just due to 
the
inherent randomness of these other factors on the production machines.


Well, I already posted a test with ab and 12 000 requests, so not sure about 
the 'quite often' part?

This is on the faulty one :

xxxx@arsene:~$ perl -le '%h=();for (1..10_000_000) {my $session_id = join "", map 
+(0..9,"a".."z","A".."Z")[rand(10+26*2)], 1..32;$h{$session_id}=1};$v=keys %h; print $v'
10000000



Yes, but this is *one* process. Each independent process, if you consider the keys, will get a succession of different answers from rand(), and thus generate different keys. But if n different processes were all starting with the same initial seed, they would all generate the *same* sequence of rand() responses, and the same sequence of keys. And that is what I am saying : each of your Apache pre-fork children is a separate process, but they all always start with the same random seed. So they will all, ultimately, generate the same sequence of keys (but not necessarily at the same time).

Let's say that there are initially 5 Apache children, and that Apache never starts more than 5. Now you start bombarding the server with hundreds of requests, all of them triggering the key-generation mechanism. And let's say that it takes your module 1 s. to respond to a request (just to make things simpler below).

T0 :
Request #1 comes in.
The main Apache looks for a free child, and finds child #1.
It passes request #1 to child #1.
This child will be busy until T0 + 1s.

T0 + 0.1s :
Request #2 comes in.
The main Apache looks for a free child.
Child #1 is still busy, so it finds child #2.
It passes request #2 to child #2.
This child will be busy until T0 + 2s.

and so on..
(child 5 is now busy until T0 + 5 s.)

Request #6, at T0 + 0.6s) :
Now all 5 children are busy, and Apache has to wait with the request, until
one child becomes free (*).
In this very simplified case, it will be child #1, at T0 + 1s.

At T0 + 1s, child #1 becomes free again. So child #1 now gets request #6, which for him is only the *second* request that it processes.
So it generates *its* key #2 (which globally is the generated key #6).

In this very simplified example, the first 5 keys generated globally by Apache will be identical, because each child starts with the same seed, and they are all called neatly in a regular sequence. And then the next 5 keys will be identical, because for each child it is now the second request.
And so on.

But in a real situation :
- not all requests come in so neatly at regular intervals
(so for example child #1 may become free, before child #5 is even called once).
- not all requests take the same time to serve (other things happen on the 
server)
- not all requests generate a key (so if child #4 is called but does not call rand(), it does not count; or if it calls rand() only 5 times instead of 32, that screws up the whole sequence, and it will now start generating keys that are different from all the others)
- the number of children will vary over time. New ones will be created as 
needed,
some older ones will die and be replaced by a brand-new one. Each time that happens, the new child will start with key #1 again, because it jus got a brand-new perl. While at the same time, there may still be an older child alive, for which the next key is already number 5000 in its own sequence.
- etc..
And this "disorder" will tend to be larger, the more loaded is that server.
So over any given period of time, each child will tend to be at a different stage in his rand() calls. And the risk of having the same key being returned to 2 clients at about the same time, is relatively low. But if the keys are stored somewhere in a persistent way, you are increasing the risk greatly, because key #13 generated by a new child today, may conflict with the key #13 generated by another child yesterday.

(*) or start an additional child

Reply via email to