techdocsmith commented on code in PR #13486:
URL: https://github.com/apache/druid/pull/13486#discussion_r1038651506


##########
docs/development/extensions-core/datasketches-tuple.md:
##########
@@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"]
 |name|A String for the output (result) name of the calculation.|yes|
 |fieldName|A String for the name of the input field.|yes|
 |nominalEntries|Parameter that determines the accuracy and size of the sketch. 
Higher k means higher accuracy but more space to store sketches. Must be a 
power of 2. See the [Theta sketch 
accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for 
details. |no, defaults to 16384|
-|numberOfValues|Number of values associated with each distinct key. |no, 
defaults to 1|
-|metricColumns|If building sketches from raw data, an array of names of the 
input columns containing numeric values to be associated with each distinct 
key.|no, defaults to empty array|
+|metricColumns|If building sketches from raw data, an array of names of the 
input columns containing numeric values to be associated with each distinct 
key. If not provided `filedName` is assumed to be an arrayOfDoublesSketch|no, 
if not provided `filedName` is assumed to be an arrayOfDoublesSketch|
+|numberOfValues|Number of values associated with each distinct key. |no, 
defaults to the length of `metricColumns` if provided and 1 otherwise|
+
+The `arrayOfDoublesSketch` aggregator has two modes of useage:
+
+- built from raw data - `metricColumns` is set to an array
+- directly on top of an ArrayOfDoubles sketch - `metricColumns` is unset and 
`fieldName` represents an ArrayOfDoubles sketch (base64 encoded if at ingestion 
time) with `numberOfValues` doubles.
+
+#### Example on top of raw data
+
+Compute a theta of unique users, for each user store the `added` and `deleted` 
scoers

Review Comment:
   ```suggestion
   Compute a theta of unique users. For each user store the `added` and 
`deleted` scores in a column called `users_theta`.
   ```
   typo?



##########
docs/development/extensions-core/datasketches-tuple.md:
##########
@@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"]
 |name|A String for the output (result) name of the calculation.|yes|
 |fieldName|A String for the name of the input field.|yes|
 |nominalEntries|Parameter that determines the accuracy and size of the sketch. 
Higher k means higher accuracy but more space to store sketches. Must be a 
power of 2. See the [Theta sketch 
accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for 
details. |no, defaults to 16384|
-|numberOfValues|Number of values associated with each distinct key. |no, 
defaults to 1|
-|metricColumns|If building sketches from raw data, an array of names of the 
input columns containing numeric values to be associated with each distinct 
key.|no, defaults to empty array|
+|metricColumns|If building sketches from raw data, an array of names of the 
input columns containing numeric values to be associated with each distinct 
key. If not provided `filedName` is assumed to be an arrayOfDoublesSketch|no, 
if not provided `filedName` is assumed to be an arrayOfDoublesSketch|
+|numberOfValues|Number of values associated with each distinct key. |no, 
defaults to the length of `metricColumns` if provided and 1 otherwise|
+
+The `arrayOfDoublesSketch` aggregator has two modes of useage:
+
+- built from raw data - `metricColumns` is set to an array
+- directly on top of an ArrayOfDoubles sketch - `metricColumns` is unset and 
`fieldName` represents an ArrayOfDoubles sketch (base64 encoded if at ingestion 
time) with `numberOfValues` doubles.
+
+#### Example on top of raw data
+
+Compute a theta of unique users, for each user store the `added` and `deleted` 
scoers
+
+```json
+{
+  "type": "arrayOfDoublesSketch",
+  "name": "users_theta",
+  "fieldName": "user",
+  "nominalEntries": 16384,
+  "metricColumns": ["added", "deleted"],
+}
+```
+
+### Example on top of precomputed sketchs
+
+Ingest a sketch column called `user_sketches` that has two doubles in its 
array.

Review Comment:
   ```suggestion
   Ingest a sketch column called `user_sketches` that has a base-64 encoded 
value of two doubles in its array and store it in a column called `users_theta`.
   ```



##########
docs/development/extensions-core/datasketches-tuple.md:
##########
@@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"]
 |name|A String for the output (result) name of the calculation.|yes|
 |fieldName|A String for the name of the input field.|yes|
 |nominalEntries|Parameter that determines the accuracy and size of the sketch. 
Higher k means higher accuracy but more space to store sketches. Must be a 
power of 2. See the [Theta sketch 
accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for 
details. |no, defaults to 16384|
-|numberOfValues|Number of values associated with each distinct key. |no, 
defaults to 1|
-|metricColumns|If building sketches from raw data, an array of names of the 
input columns containing numeric values to be associated with each distinct 
key.|no, defaults to empty array|
+|metricColumns|If building sketches from raw data, an array of names of the 
input columns containing numeric values to be associated with each distinct 
key. If not provided `filedName` is assumed to be an arrayOfDoublesSketch|no, 
if not provided `filedName` is assumed to be an arrayOfDoublesSketch|

Review Comment:
   ```suggestion
   |metricColumns|When building sketches from raw data, an array input column 
that contain numeric values to associate with each distinct key. If not 
provided, assumes `fieldName` is an `arrayOfDoublesSketch`|no, if not provided 
`fieldName` is assumed to be an arrayOfDoublesSketch|
   ```



##########
docs/development/extensions-core/datasketches-tuple.md:
##########
@@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"]
 |name|A String for the output (result) name of the calculation.|yes|
 |fieldName|A String for the name of the input field.|yes|
 |nominalEntries|Parameter that determines the accuracy and size of the sketch. 
Higher k means higher accuracy but more space to store sketches. Must be a 
power of 2. See the [Theta sketch 
accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for 
details. |no, defaults to 16384|
-|numberOfValues|Number of values associated with each distinct key. |no, 
defaults to 1|
-|metricColumns|If building sketches from raw data, an array of names of the 
input columns containing numeric values to be associated with each distinct 
key.|no, defaults to empty array|
+|metricColumns|If building sketches from raw data, an array of names of the 
input columns containing numeric values to be associated with each distinct 
key. If not provided `filedName` is assumed to be an arrayOfDoublesSketch|no, 
if not provided `filedName` is assumed to be an arrayOfDoublesSketch|
+|numberOfValues|Number of values associated with each distinct key. |no, 
defaults to the length of `metricColumns` if provided and 1 otherwise|
+
+The `arrayOfDoublesSketch` aggregator has two modes of useage:
+
+- built from raw data - `metricColumns` is set to an array
+- directly on top of an ArrayOfDoubles sketch - `metricColumns` is unset and 
`fieldName` represents an ArrayOfDoubles sketch (base64 encoded if at ingestion 
time) with `numberOfValues` doubles.

Review Comment:
   ```suggestion
   You can use the `arrayOfDoublesSketch` aggregator to:
   
   - Build sketches from raw data. In this case, set `metricColumns` to an 
array.
   - Build a sketch from an existing ArrayOfDoubles sketch . In this case, 
leave metricColumns` is unset and set the `fieldName` to an `ArrayOfDoubles` 
sketch with `numberOfValues` doubles. At ingestion time, you must base64 encode 
`ArrayOfDoubles`  sketches at ingestion time.
   ```



##########
docs/development/extensions-core/datasketches-tuple.md:
##########
@@ -50,8 +50,41 @@ druid.extensions.loadList=["druid-datasketches"]
 |name|A String for the output (result) name of the calculation.|yes|

Review Comment:
   ```suggestion
   |name|String representing the output column to store sketch values.|yes|
   ```
   This is the destination column for the sketch, no?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to