[
https://issues.apache.org/jira/browse/BEAM-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810050#comment-16810050
]
Yueyang Qiu commented on BEAM-2728:
-----------------------------------
Hi Brachi! FYI, the open-sourced HLL implementation this library based on is
different from Google Cloud BigQuery's implementation of HLL. So the sketch
won't be consistent.
There are efforts going on to support the same HLL implementation in Beam.
> Extension for sketch-based statistics
> -------------------------------------
>
> Key: BEAM-2728
> URL: https://issues.apache.org/jira/browse/BEAM-2728
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-sketching
> Reporter: Arnaud Fournier
> Assignee: Arnaud Fournier
> Priority: Minor
> Time Spent: 12h 40m
> Remaining Estimate: 0h
>
> Goal : Provide an extension library to compute approximate statistics on
> streams.
> Interest : Probabilistic data structures can create an approximation (sketch)
> of the current state of a stream without storing every element but rather
> processing each observation quickly to summarize its current state and find
> useful statistical insights.
> Implementation is here :
> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/extensions/sketching
> More info :
> https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUeusiwL0Jo2ACI5PEOP1kc/edit
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)