Sounds good! Added this idea to the future work section at the end of the doc.
On Mon, Jun 24, 2019 at 12:31 PM Rui Wang <[email protected]> wrote: > Thanks Robin! It would also be interesting if we could offer HLL_COUNT > functions in BeamSQL based on your proposal! > > > -Rui > > On Mon, Jun 24, 2019 at 10:47 AM Robin Qiu <[email protected]> wrote: > >> Hi all, >> >> I have written a doc >> <https://docs.google.com/document/d/1U5aXdC9lDSOqT6FPHRulp-EutYiQ9KeHpgu-19CIfEI> >> proposing we integrate the HyperLogLog++ algorithm >> <http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40671.pdf> >> into Beam as a new combiner. The algorithm solves the count-distinct >> problem <https://en.wikipedia.org/wiki/Count-distinct_problem>, and the >> intermediate sketch (aggregator) format will be compatible with sketches >> computed via the HLL_COUNT functions >> <https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions> >> in Google Cloud BigQuery (because they will be based on the same >> implementation: ZetaSketch <https://github.com/google/zetasketch>). The >> tracking JIRA issue is BEAM-7013 >> <https://issues.apache.org/jira/browse/BEAM-7013>. >> >> The API design proposed in the doc is subject to change and open to >> comments. Please feel free to comment if you have any thoughts. >> >> Cheers, >> Robin >> >
