techdocsmith commented on code in PR #13088:
URL: https://github.com/apache/druid/pull/13088#discussion_r972529744


##########
docs/development/extensions-core/datasketches-hll.md:
##########
@@ -59,6 +59,11 @@ druid.extensions.loadList=["druid-datasketches"]
  }
 ```
 
+The `HLLSketchBuild` aggregator builds a datasketch from the input column 
specified. If used during ingestion, this
+will result in Druid storing pre-generated HLL Sketch objects in the 
datasource, rather than the original value itself.
+If used at query time on an existing dimension, the resulting column can be 
used as an intermediate dimension by the
+post-aggregators below.
+

Review Comment:
   ```suggestion
   The `HLLSketchBuild` aggregator builds a datasketch from the specified input 
column. When used during ingestion, Druid stores pre-generated HLL Sketch 
objects in the datasource instead of the original values.
   When applied at query time on an existing dimension, you can use the 
resulting column as an intermediate dimension by the 
[post-aggregators](#post-aggregators).
   
   ```
   Thanks for the clarification/ contribution @cloventt ! I've suggest some 
stylistic changes.



##########
docs/development/extensions-core/datasketches-hll.md:
##########
@@ -89,6 +94,11 @@ druid.extensions.loadList=["druid-datasketches"]
  }
 ```
 
+The `HLLSketchMerge` aggregator can be used to ingest pre-generated sketches 
from an input dataset. For example, an
+earlier batch processing job can be used to generate the sketches before the 
data is sent to Druid. To support this
+behaviour, the sketches in the input dataset must be serialised to 
base64-encoded bytes. Then, in the native ingestion
+`MetricsSpec` the `HLLSketchMerge` must be specified for the input column as 
shown above.
+

Review Comment:
   ```suggestion
   You can use the `HLLSketchMerge` aggregator to ingest pre-generated sketches 
from an input dataset. For example, you can set up a batch processing job to 
generate the sketches before sending the data to Druid. You must serialize the 
sketches in the input dataset to base-64 encoded bytes. Then, specify 
`HLLSketchMerge` for the input column in the native ingestion`MetricsSpec`.
   
   ```
   Stylistic suggestions. Also wonder if it might be helpful to have an example 
of the `MetricsSpec`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to