[ 
https://issues.apache.org/jira/browse/HBASE-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777779#comment-13777779
 ] 

Jean-Marc Spaggiari commented on HBASE-9543:
--------------------------------------------

Now, last comment...

What's about scalability? If the unique is done on a column where ALL the 
values are different (Might be called my mistake on a UID). This is going to 
load ALL the values into memory for ALL the regions. If you have a 10GB region, 
and 70% of it is the value, that mean you are going to create on 7GB set into 
colSet. Multiply that by the number of regions and you are in trouble. Should 
there be a property to limit this? You can't really send intermediate results 
because you need to keep them for the comparison. So should there be something 
like aggregate.uniq.maximum.values=10000 which will limit the size of the set 
to that number of entries and will throw an exception if we go over?
                
>  Impl unique aggregation
> ------------------------
>
>                 Key: HBASE-9543
>                 URL: https://issues.apache.org/jira/browse/HBASE-9543
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Minor
>         Attachments: HBASE-9543-0.94-v1.diff, HBASE-9543-trunk-v1.diff, 
> HBASE-9543-trunk-v2.diff
>
>
> Impl unique aggregation: return a set of all columns' values in a scan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to