Sounds good! Added this idea to the future work section at the end of the
doc.

On Mon, Jun 24, 2019 at 12:31 PM Rui Wang <[email protected]> wrote:

> Thanks Robin! It would also be interesting if we could offer HLL_COUNT
> functions in BeamSQL based on your proposal!
>
>
> -Rui
>
> On Mon, Jun 24, 2019 at 10:47 AM Robin Qiu <[email protected]> wrote:
>
>> Hi all,
>>
>> I have written a doc
>> <https://docs.google.com/document/d/1U5aXdC9lDSOqT6FPHRulp-EutYiQ9KeHpgu-19CIfEI>
>> proposing we integrate the HyperLogLog++ algorithm
>> <http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40671.pdf>
>> into Beam as a new combiner. The algorithm solves the count-distinct
>> problem <https://en.wikipedia.org/wiki/Count-distinct_problem>, and the
>> intermediate sketch (aggregator) format will be compatible with sketches
>> computed via the HLL_COUNT functions
>> <https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions>
>> in Google Cloud BigQuery (because they will be based on the same
>> implementation: ZetaSketch <https://github.com/google/zetasketch>). The
>> tracking JIRA issue is BEAM-7013
>> <https://issues.apache.org/jira/browse/BEAM-7013>.
>>
>> The API design proposed in the doc is subject to change and open to
>> comments. Please feel free to comment if you have any thoughts.
>>
>> Cheers,
>> Robin
>>
>

Reply via email to