[ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=309259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309259
 ]

ASF GitHub Bot logged work on BEAM-7013:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Sep/19 21:19
            Start Date: 09/Sep/19 21:19
    Worklog Time Spent: 10m 
      Work Description: robinyqiu commented on pull request #9519: [BEAM-7013] 
Use a 0-length byte array to represent empty sketch in HllCount
URL: https://github.com/apache/beam/pull/9519#discussion_r322440348
 
 

 ##########
 File path: 
sdks/java/extensions/zetasketch/src/main/java/org/apache/beam/sdk/extensions/zetasketch/HllCountMergePartialFn.java
 ##########
 @@ -54,10 +54,15 @@ private HllCountMergePartialFn() {}
     return null;
   }
 
+  @Nullable
   @Override
   public HyperLogLogPlusPlus<HllT> addInput(
       @Nullable HyperLogLogPlusPlus<HllT> accumulator, byte[] input) {
 
 Review comment:
   The `@Nullable` annotation is on the `accumulator` parameter, which can be 
null and we are handling that properly without throwing an exception.
   
   > tl;dr: why not handle nulls instead of throwing?
   
   There are pros and cons of supporting nulls:
   * The pro is that we can save the users from exceptions, as you mentioned
   * The cons are 1) then we will have two different representations for 
''empty sketch"; and 2) I feel like if we accept nulls as input then we are 
encouraging users to produce nullable output (and use `NullableCoder`) from its 
upstream transform, which is more costly in terms of encoding/decoding and more 
error prone.
   
   Currently I slightly prefer not accepting nulls as "empty sketches". What is 
your opinion? A good thing if we keep the implementation as it is: we can 
always change it later to support nulls and it will be backwards compatible 
(but we cannot go the other way).
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 309259)
    Time Spent: 30h 40m  (was: 30.5h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> ----------------------------------------------------------------------------------------
>
>                 Key: BEAM-7013
>                 URL: https://issues.apache.org/jira/browse/BEAM-7013
>             Project: Beam
>          Issue Type: New Feature
>          Components: extensions-java-sketching, sdk-java-core
>            Reporter: Yueyang Qiu
>            Assignee: Yueyang Qiu
>            Priority: Major
>             Fix For: 2.16.0
>
>          Time Spent: 30h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to