[
https://issues.apache.org/jira/browse/HBASE-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777779#comment-13777779
]
Jean-Marc Spaggiari commented on HBASE-9543:
--------------------------------------------
Now, last comment...
What's about scalability? If the unique is done on a column where ALL the
values are different (Might be called my mistake on a UID). This is going to
load ALL the values into memory for ALL the regions. If you have a 10GB region,
and 70% of it is the value, that mean you are going to create on 7GB set into
colSet. Multiply that by the number of regions and you are in trouble. Should
there be a property to limit this? You can't really send intermediate results
because you need to keep them for the comparison. So should there be something
like aggregate.uniq.maximum.values=10000 which will limit the size of the set
to that number of entries and will throw an exception if we go over?
> Impl unique aggregation
> ------------------------
>
> Key: HBASE-9543
> URL: https://issues.apache.org/jira/browse/HBASE-9543
> Project: HBase
> Issue Type: New Feature
> Components: Coprocessors
> Reporter: Liu Shaohui
> Assignee: Liu Shaohui
> Priority: Minor
> Attachments: HBASE-9543-0.94-v1.diff, HBASE-9543-trunk-v1.diff,
> HBASE-9543-trunk-v2.diff
>
>
> Impl unique aggregation: return a set of all columns' values in a scan.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira