techdocsmith commented on code in PR #13088: URL: https://github.com/apache/druid/pull/13088#discussion_r972530970
########## docs/development/extensions-core/datasketches-hll.md: ########## @@ -89,6 +94,11 @@ druid.extensions.loadList=["druid-datasketches"] } ``` +The `HLLSketchMerge` aggregator can be used to ingest pre-generated sketches from an input dataset. For example, an +earlier batch processing job can be used to generate the sketches before the data is sent to Druid. To support this +behaviour, the sketches in the input dataset must be serialised to base64-encoded bytes. Then, in the native ingestion +`MetricsSpec` the `HLLSketchMerge` must be specified for the input column as shown above. + Review Comment: ```suggestion You can use the `HLLSketchMerge` aggregator to ingest pre-generated sketches from an input dataset. For example, you can set up a batch processing job to generate the sketches before sending the data to Druid. You must serialize the sketches in the input dataset to Base64-encoded bytes. Then, specify `HLLSketchMerge` for the input column in the native ingestion `metricsSpec`. ``` Stylistic suggestions. Also wonder if it might be helpful to have an example of the `MetricsSpec`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
