Copilot commented on code in PR #98:
URL: https://github.com/apache/datasketches-rust/pull/98#discussion_r2825644844
##########
datasketches/src/hll/sketch.rs:
##########
@@ -94,8 +94,8 @@ impl HllSketch {
///
/// # Arguments
///
- /// * `lg_config_k` - Log2 of the number of buckets (K)
- /// * `mode` - The mode to initialize the sketch with
+ /// * `lg_config_k`: Log2 of the number of buckets (K)
+ /// * `mode`: The mode to initialize the sketch with
Review Comment:
Same doc-style consistency issue as above: the internal `from_mode` docs use
`* `param`: ...` instead of the `* `param` - ...` format used elsewhere in the
crate. Aligning these helps keep rustdoc argument lists consistent.
```suggestion
/// * `lg_config_k` - Log2 of the number of buckets (K)
/// * `mode` - The mode to initialize the sketch with
```
##########
datasketches/src/hll/sketch.rs:
##########
@@ -54,15 +54,15 @@ impl HllSketch {
///
/// # Arguments
///
- /// * `lg_config_k` - Log2 of the number of buckets (K). Must be in [4,
21].
+ /// * `lg_config_k`: Log2 of the number of buckets (K). Must be in `[4,
21]`.
/// - lg_k=4: 16 buckets, ~26% relative error
/// - lg_k=12: 4096 buckets, ~1.6% relative error (common choice)
/// - lg_k=21: 2M buckets, ~0.4% relative error
- /// * `hll_type` - Target HLL array type (Hll4, Hll6, or Hll8)
+ /// * `hll_type`: Target HLL array type (Hll4, Hll6, or Hll8)
///
Review Comment:
In this crate the `# Arguments` sections typically document parameters using
the `* `param` - description` form (e.g., `hll/estimator.rs`). This change
switches to `* `param`: ...`, which makes the docs inconsistent with the rest
of the module. Consider using the existing `-` format for consistency.
##########
datasketches/src/hll/union.rs:
##########
@@ -59,13 +59,13 @@ impl HllUnion {
///
/// # Arguments
///
- /// * `lg_max_k` - Maximum log2 of the number of buckets. Must be in [4,
21]. This determines
+ /// * `lg_max_k`: Maximum log2 of the number of buckets. Must be in `[4,
21]`. This determines
Review Comment:
In this crate the `# Arguments` sections typically document parameters using
the `* `param` - description` form (e.g., `hll/estimator.rs`). Here it was
changed to `* `param`: description`, which is inconsistent with existing
rustdoc style in the HLL module. Consider reverting to the established `-`
format for consistency across docs.
```suggestion
/// * `lg_max_k` - Maximum log2 of the number of buckets. Must be in
`[4, 21]`. This determines
```
##########
datasketches/src/frequencies/mod.rs:
##########
@@ -17,16 +17,66 @@
//! Frequency sketches for finding heavy hitters in data streams.
//!
-//! This module implements the Frequent Items sketch from Apache DataSketches.
It tracks
-//! approximate frequencies in a stream and can report heavy hitters with
explicit
-//! error guarantees (no false negatives or no false positives).
+//! # Overview
//!
-//! For background, see the Java documentation:
-//!
<https://apache.github.io/datasketches-java/9.0.0/org/apache/datasketches/frequencies/FrequentItemsSketch.html>
+//! This sketch is based on the paper ["A High-Performance Algorithm for
Identifying Frequent Items
+//! in Data Streams"](https://arxiv.org/abs/1705.07001) by Daniel Anderson,
Pryce Bevan, Kevin Lang,
+//! Edo Liberty, Lee Rhodes, and Justin Thaler.
//!
-//! # Usage
+//! This sketch is useful for tracking approximate frequencies of items of
type `T` that implements
+//! [`FrequentItemValue`], with optional associated counts (`T` item, `u64`
count) that are members
+//! of a multiset of such items. The true frequency of an item is defined to
be the sum of
+//! associated counts.
//!
-//! ```rust
+//! This implementation provides the following capabilities:
+//! - Estimate the frequency of an item.
+//! - Return upper and lower bounds of any item, such that the true frequency
is always between the
+//! upper and lower bounds.
+//! - Return a global maximum error that holds for all items in the stream.
+//! - Return an array of frequent items that qualify either
[`ErrorType::NoFalsePositives`] or
+//! [`ErrorType::NoFalseNegatives`].
+//! - Merge itself with another sketch created from this module.
+//! - Serialize to bytes, or deserialize from bytes, for storage or
transmission.
Review Comment:
The module docs state that items are of type `T` that implements
`FrequentItemValue`, and list serialization/deserialization as a capability.
However, `FrequentItemsSketch` only exposes `serialize`/`deserialize` impls for
`i64`, `u64`, and `String` (see `frequencies/sketch.rs`), so implementing
`FrequentItemValue` for a custom type does not currently enable serialization.
Please either (a) adjust the docs to clarify the current limitation, or (b) add
generic `serialize`/`deserialize` support for `T: FrequentItemValue` so the
docs are accurate.
```suggestion
//! - Serialize to bytes, or deserialize from bytes, for storage or
transmission,
//! for the built-in `i64`, `u64`, and `String` item types.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]