[jira] [Commented] (PHOENIX-3225) Distinct Queries are slower than expected at scale.

Ankit Singhal (JIRA) Thu, 01 Sep 2016 05:41:30 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455259#comment-15455259
 ]


Ankit Singhal commented on PHOENIX-3225:
----------------------------------------

Yes, it is a de-facto standard for unique calculation in analytics but there 
are multiple implementations of HLL available in form of APIs which can be used 
in UDF directly without much effort but the challenge is to chose the one which 
is performant and distributed under apache license. (Implementing our own form 
of HLL is little big work)

> Distinct Queries are slower than expected at scale.
> ---------------------------------------------------
>
>                 Key: PHOENIX-3225
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3225
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Lars Hofhansl
>
> In our large scale tests we found that we can easily sort 400G on a few 100 
> machines, but that a simple DISTINCT would just time out. Perhaps that's 
> expected as we have to keep track of the unique values, but we should 
> investigate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3225) Distinct Queries are slower than expected at scale.

Reply via email to