[ https://issues.apache.org/jira/browse/IMPALA-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fucun Chu updated IMPALA-10439: ------------------------------- Epic Name: datasketches-theta (was: Implement count(distinct) function (DataSketches/Theta)) > Implement count(distinct) function (DataSketches/Theta) > ------------------------------------------------------- > > Key: IMPALA-10439 > URL: https://issues.apache.org/jira/browse/IMPALA-10439 > Project: IMPALA > Issue Type: Epic > Components: Backend > Reporter: Fucun Chu > Priority: Major > > Implement the count(distinct) function from the DataSketches library for > Theta in C++. > Theta sketch provides approximate distinct counting with set operations > (union, intersection and set difference). > This can be used for retention analysis, eg: "How many unique users signed up > in week 1, and purchased something in week 2?" > General info about the sketch: > https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html > C++ implementation to wrap: > https://github.com/apache/datasketches-cpp/tree/master/theta > Using thetaSketch in Druid: > https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org