quenlang opened a new issue #7297: thetaSketch aggrgator handle null or "" into 
unexpected value at ingesting 
URL: https://github.com/apache/incubator-druid/issues/7297
 
 
   @AlexanderSaydakov @gianm 
   I had found the thetaSketch aggrgator handle ```null``` or ```""``` into 
unexpected value at ingesting in our scene but i'm not sure why it happend.
   
   I defined a thetaSketch aggrgator in our metrics like this
   ```
    {
           "name": "prefix_success_business_no",
           "fieldName": "prefix_success_business_no",
           "type": "thetaSketch"
    }
   ```
   The value of prefix_success_business_no column in raw data before ingesting 
maybe `null`, `empty string` or `normal string` which like "quenl...@126.com". 
we found that even though the prefix_success_business_no's value is null or "" 
or combination of the two in raw data, then after ingesting, the distinct count 
of prefix_success_business_no was not null or zero when i performed a 
thetaSketch aggrgator query like this
   ```
   ...
   {
         "type": "thetaSketch",
         "name": "prefixSuccessBusinessNo",
         "fieldName": "prefix_success_business_no",
         "size": 16384,
         "shouldFinalize": true,
         "isInputThetaSketch": false,
         "errorBoundsStdDev": null
   }
   ...
   ```
   the result like this
   ```
   [ {
     "timestamp" : "2019-03-18T13:30:00.000Z",
     "result" : {
       "prefixSuccessBusiness_no" : 16.0
     }
   } ]
   ```
   All the orignal value of prefix_success_business_no is null in this query, 
but druid return the distinct count of prefix_success_business_no 16 for me.  I 
had no ideas about this situation, does the thetaSkect not handle null or "" 
fully? my druid version was  0.13.0.
   
   There was a new aggrgator in 0.13.0 which called HLLSkecth, can you tell me 
more about the difference between HLLSkecth and thetaSkect over space, speed 
and accuracy? which one i should use in my sense.
   
   
   Best wishes !
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to