Fucun Chu created IMPALA-10439:
----------------------------------

             Summary: datasketches-theta
                 Key: IMPALA-10439
                 URL: https://issues.apache.org/jira/browse/IMPALA-10439
             Project: IMPALA
          Issue Type: Epic
          Components: Backend
            Reporter: Fucun Chu


Implement the count(distinct) function from the DataSketches library for Theta 
in C++.

Theta sketch provides approximate distinct counting with set operations (union, 
intersection and set difference).
This can be used for retention analysis, eg: "How many unique users signed up 
in week 1, and purchased something in week 2?"


General info about the sketch:
https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html

C++ implementation to wrap:
https://github.com/apache/datasketches-cpp/tree/master/theta

Using thetaSketch in Druid:
https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to