[ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=310032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310032
 ]

ASF GitHub Bot logged work on BEAM-7013:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Sep/19 18:50
            Start Date: 10/Sep/19 18:50
    Worklog Time Spent: 10m 
      Work Description: robinyqiu commented on pull request #9519: [BEAM-7013] 
Use a 0-length byte array to represent empty sketch in HllCount
URL: https://github.com/apache/beam/pull/9519#discussion_r322906525
 
 

 ##########
 File path: 
sdks/java/extensions/zetasketch/src/main/java/org/apache/beam/sdk/extensions/zetasketch/HllCountMergePartialFn.java
 ##########
 @@ -54,10 +54,15 @@ private HllCountMergePartialFn() {}
     return null;
   }
 
+  @Nullable
   @Override
   public HyperLogLogPlusPlus<HllT> addInput(
       @Nullable HyperLogLogPlusPlus<HllT> accumulator, byte[] input) {
 
 Review comment:
   > I would lean towards avoiding user errors, since every error avoided is 
something that users don't need to revise their pipeline over, and is an issue 
that is not escalated to us.
   
   Agreed. Actually I figured out that we can accept nulls and leave a log 
warning to suggest replacement with byte[0]. Made that change. PTAL.
   
   > Also, if users need to filter their input and replace nulls with byte[0], 
is that streamed (resp. folded into another pass over the data) or does it 
result in an extra-pass over the data?
   
   That depends on their pipeline implementation. If users do that in the 
`PTransform` where null is created (e.g. 
[here](https://github.com/robinyqiu/beam/blob/3b6a628c9ad0fbf63b7c1f7d355dbc8cf5219eb2/sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java#L144)
 in BigQueryIO), then it will not result in an extra-pass.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 310032)
    Time Spent: 33h 10m  (was: 33h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> ----------------------------------------------------------------------------------------
>
>                 Key: BEAM-7013
>                 URL: https://issues.apache.org/jira/browse/BEAM-7013
>             Project: Beam
>          Issue Type: New Feature
>          Components: extensions-java-sketching, sdk-java-core
>            Reporter: Yueyang Qiu
>            Assignee: Yueyang Qiu
>            Priority: Major
>             Fix For: 2.16.0
>
>          Time Spent: 33h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to