[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148175#comment-16148175
 ] 

Hadoop QA commented on PHOENIX-418:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12884517/PHOENIX-418-v6-pom.patch
  against master branch at commit e6c1f01c5f7d5c9996017714efe90202a95355b2.
  ATTACHMENT ID: 12884517

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+0 tests included{color}.  The patch appears to be a 
documentation, build,
                        or dev patch that doesn't require tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
62 warning messages.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.GroupByIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.TableSnapshotReadsMapReduceIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1325//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1325//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1325//console

This message is automatically generated.

> Support approximate COUNT DISTINCT
> ----------------------------------
>
>                 Key: PHOENIX-418
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-418
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: Ethan Wang
>            Priority: Blocker
>              Labels: gsoc2016
>             Fix For: 4.12.0
>
>         Attachments: PHOENIX-418-v1.patch, PHOENIX-418-v2.patch, 
> PHOENIX-418-v3.patch, PHOENIX-418-v4.patch, PHOENIX-418-v5.patch, 
> PHOENIX-418-v6.patch, PHOENIX-418-v6-pom.patch
>
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).
> Update:
> Syntax of using approximate count distinct as:
> select APPROX_COUNT_DISTINCT(name) from person
> select APPROX_COUNT_DISTINCT(address||name) from person
> It is equivalent of  Select COUNT(DISTINCT ID) from person. Implemented using 
> hyperloglog, see discuss below.
> Source code patch link below, co-authorred with [~swapna]
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=d6381afc3af976ccdbb874d4458ea17b1e8a1d32



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to