c-dickens commented on code in PR #325:
URL: https://github.com/apache/datasketches-cpp/pull/325#discussion_r1071094476
##########
count/include/count_min.hpp:
##########
@@ -0,0 +1,166 @@
+#ifndef COUNT_MIN_HPP_
+#define COUNT_MIN_HPP_
+
+#include <cstdint>
+#include <vector>
+#include <iterator>
+#include <algorithm>
+
+#include "common_defs.hpp"
+
+namespace datasketches {
+
+ /*
+ * C++ implementation of the CountMin sketch data structure of Cormode and
Muthukrishnan.
+ * [1] - http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf
+ * @author Charlie Dickens
+ */
+
+template<typename T> class count_min_sketch ;
+
+template<typename T>
+class count_min_sketch{
+ static_assert(std::is_arithmetic<T>::value, "Arithmetic type expected");
+public:
+ uint64_t num_hashes, num_buckets, seed, sketch_length ;
Review Comment:
- num_hashes has been changed to uint8. The failure probability `delta =
exp(-d)` and `exp(-255)` is on the order of 10^(-100) which will be small
enough for applications. On the other hand using a smaller integer type eg
`exp(-15)` would rule out sketches with say 20 hash functions. This is already
a small probability of failure though.
- `num_buckets` has been changed to 32 bits by similar reasoning.
- In future, we could make this flexible because 16 bit num_hashes and 32
bit num_buckets might still be larger than necessary for some applications. Eg
if the user can tolerate failure probability up to 1E-6, 1E-7, then 8 bit uints
could be used for the number of hash functions, followed by a similar reduction
in num_buckets if lower error can be tolerated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]