[ 
https://issues.apache.org/jira/browse/BEAM-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809803#comment-16809803
 ] 

Brachi Packter commented on BEAM-2728:
--------------------------------------

I want to save the sketch itself to BigQuery, to be able to perform merge 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions]

I used this library 
[https://github.com/apache/beam/tree/master/sdks/java/extensions/sketching]

and in the code:
{code:java}
.apply("hll-count", Combine.perKey(ApproximateDistinct.ApproximateDistinctFn 
.create(StringUtf8Coder.of())))
.apply("to-table-row", ParDo.of(new DoFn< 
ValueInSingleWindow<KV<GroupByData,HyperLogLogPlus>>, TableRow>() { 
   @ProcessElement 
   public void processElement(ProcessContext processContext) { 
     ValueInSingleWindow<KV<GroupByData,HyperLogLogPlus>> windowed = 
processContext.element(); 
     KV<GroupByData, HyperLogLogPlus> keyData = windowed.getValue(); 
     GroupByData key = keyData.getKey(); 
     HyperLogLogPlus hllSketch = keyData.getValue(); 
     TableRow tableRow = new TableRow(); 
     tableRow.set("country_code",key.countryCode); 
     tableRow.set("event", key.event); 
     tableRow.set("profile", key.profile);
 
{code}
// How can I get the HLL ????????
{code:java}
tableRow.set("hll",hllSketch.getBytes());{code}

> Extension for sketch-based statistics
> -------------------------------------
>
>                 Key: BEAM-2728
>                 URL: https://issues.apache.org/jira/browse/BEAM-2728
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-sketching
>            Reporter: Arnaud Fournier
>            Assignee: Arnaud Fournier
>            Priority: Minor
>          Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Goal : Provide an extension library to compute approximate statistics on 
> streams.
> Interest : Probabilistic data structures can create an approximation (sketch) 
> of the current state of a stream without storing every element but rather 
> processing each observation quickly to summarize its current state and find 
> useful statistical insights.
> Implementation is here : 
> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/extensions/sketching
> More info : 
> https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUeusiwL0Jo2ACI5PEOP1kc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to