Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

Bryan Duxbury Wed, 05 Dec 2007 15:51:22 -0800

If you have a table with something like a billion rows, and do anaggregate function on the table from the shell, you will end upreading all billion rows through a single machine, essentiallyaggregating the entire dataset locally. This defeats the purpose ofhaving a massively distributed database like HBase. To do this moreefficiently, you'd ideally kick of a Map Reduce job that can performthe various aggregation function on the dataset in parallel,harnessing the power of the distributed dataset, and then returningthe results to a central location once they are calculated.

I think putting this option into the shell is risky, because it willencourage people to think that the shell is a good way to interactwith HBase in general, which it isn't. We want people to understandHBase is best consumed in parallel and discourage solutions thataggregate access through a single point. As such, we shouldn't buildfeatures that allow people to inadvertently use the wrong accesspatterns.


On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote:

[ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548879 ]
Edward Yoon commented on HADOOP-2006:
-------------------------------------

I don't understand your comment.
Please more explanation for me.
Aggregate Functions in select statement
---------------------------------------

                Key: HADOOP-2006
URL: https://issues.apache.org/jira/browse/HADOOP-2006
            Project: Hadoop
         Issue Type: Sub-task
         Components: contrib/hbase
   Affects Versions: 0.14.1
           Reporter: Edward Yoon
           Assignee: Edward Yoon
           Priority: Minor
            Fix For: 0.16.0
Aggregation functions on collections of data values: average,minimum, maximum, sum, count.Group rows by value of an columnfamily and apply aggregatefunction independently to each group of rows.
 * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)
{code}
select producer, avg(year) from movieLog_table group by producer
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

Reply via email to