http://d.puremagic.com/issues/show_bug.cgi?id=10322
--- Comment #4 from Joseph Rushton Wakeling <[email protected]> 2013-06-10 12:58:16 PDT --- Note: there's an interesting discrepancy in results if the samplepop.d test code is tweaked to sample 5 from 10 instead of 5 from 100. The key difference here is that because the ratio of sample points to available points is smaller, Algorithm A will be used instead of Algorithm D. If we do this, we observe that the sample with popFront() called first does not reflect an even selection of items from 1 to 99; rather, it looks like the case where .front is called prior to .popFront(). The explanation for this is as follows. First, let's cover the 5-from-10 case. If .popFront() is called before .front, then the value of _Vprime (used by Algorithm D) has not been initialized, and is equal to nan. When _Vprime is used in RandomSample.skip() (std.random(1835)), the cast means that the value of s is set to 0, and _Vprime is reset (line 1853) to a value of -nan. This in turn registers false to the if(_Vprime > 1.0) test, which takes us down to line 1899 to return s, which is 0. So, it follows that the call to prime() in the initial popFront() does not change anything -- the only change to internal values occurs in the body of popFront() itself with its calls to _input.popFront() and its increments of _toSelect, _available and _index. So, at the end of popFront() we are sitting on the second element of the original _input range, _toSelect and _available are 4 and 99 respectively, and the internal variable _first is still true. Now when we read front() it will check _first and, finding it to be true (line 1682) will call prime(), which in turn calls skip() and hence selects as if it were selecting the first point of the sample (with _toSelect == 4 and _available == 99). Hence the even distribution of selection from points 1-99. If instead we're selecting 5 from 10, the initial call to popFront() and thence prime() will instead call skipA() (via skip(), lines 1820-1824), which (not being dependent on any uninitialized variables) will generate a true, typically non-zero skip value s. Then, when we call front(), with _first still true, the second call to prime() will generate _another_ real skip. It follows that in this second case the algorithm is genuinely taking two sample points, and hence the non-uniform distribution of selection across points 1-99. Sad to say this would all have been caught much easier if skip() had used std.conv.to instead of cast() .... :-( -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
