[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148175#comment-16148175 ]
Hadoop QA commented on PHOENIX-418: ----------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12884517/PHOENIX-418-v6-pom.patch against master branch at commit e6c1f01c5f7d5c9996017714efe90202a95355b2. ATTACHMENT ID: 12884517 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 62 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.GroupByIT ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.TableSnapshotReadsMapReduceIT Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/1325//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/1325//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/1325//console This message is automatically generated. > Support approximate COUNT DISTINCT > ---------------------------------- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: New Feature > Reporter: James Taylor > Assignee: Ethan Wang > Priority: Blocker > Labels: gsoc2016 > Fix For: 4.12.0 > > Attachments: PHOENIX-418-v1.patch, PHOENIX-418-v2.patch, > PHOENIX-418-v3.patch, PHOENIX-418-v4.patch, PHOENIX-418-v5.patch, > PHOENIX-418-v6.patch, PHOENIX-418-v6-pom.patch > > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). > Update: > Syntax of using approximate count distinct as: > select APPROX_COUNT_DISTINCT(name) from person > select APPROX_COUNT_DISTINCT(address||name) from person > It is equivalent of Select COUNT(DISTINCT ID) from person. Implemented using > hyperloglog, see discuss below. > Source code patch link below, co-authorred with [~swapna] > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=d6381afc3af976ccdbb874d4458ea17b1e8a1d32 -- This message was sent by Atlassian JIRA (v6.4.14#64029)