I'm fairly confident in the selection sampling algorithm as well. However I'm 
not quite ready to sign off on it. The algorithm as Knuth describes it 
generates its random numbers slightly differently.

Knuth lists the following steps for selection sampling:

- Generate a random number `U` uniformly distributed between zero and one.
- if `(unsampled_size * U) < remaining_inserts` then choose that element.

However this implementation of `sample` does:

- Generate a random number `U` uniformly distributed between zero and 
`unsampled_size - 1`.
- if `U < remaining_inserts` then choose that element.

My concern is that semantic change could introduce a bias. However I'm bad at 
math and stats so I don't know.
I want to think on it a little more to make sure they are equivalent.

I've also created a patch of various changes and additions related to adding 
the `<experimental/algorithm>` header and making the tests run in C++03 (this 
invalidates the above patch.). Please consider applying this to your patch.
https://gist.github.com/EricWF/c62b0979202cfd2cc6af


================
Comment at: include/experimental/algorithm:73
@@ +72,3 @@
+  for (__n = _VSTD::min(__n, __unsampled_sz); __n != 0; ++__first) {
+    _Distance __r =
+        _VSTD::uniform_int_distribution<_Distance>(0, --__unsampled_sz)(__g);
----------------
Same note about using `_Distance`

http://reviews.llvm.org/D9044

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/



_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to