If you have a table with something like a billion rows, and do an
aggregate function on the table from the shell, you will end up
reading all billion rows through a single machine, essentially
aggregating the entire dataset locally. This defeats the purpose of
having a massively distributed database like HBase. To do this more
efficiently, you'd ideally kick of a Map Reduce job that can perform
the various aggregation function on the dataset in parallel,
harnessing the power of the distributed dataset, and then returning
the results to a central location once they are calculated.
I think putting this option into the shell is risky, because it will
encourage people to think that the shell is a good way to interact
with HBase in general, which it isn't. We want people to understand
HBase is best consumed in parallel and discourage solutions that
aggregate access through a single point. As such, we shouldn't build
features that allow people to inadvertently use the wrong access
patterns.
On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote:
[ https://issues.apache.org/jira/browse/HADOOP-2006?
page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel#action_12548879 ]
Edward Yoon commented on HADOOP-2006:
-------------------------------------
I don't understand your comment.
Please more explanation for me.
Aggregate Functions in select statement
---------------------------------------
Key: HADOOP-2006
URL: https://issues.apache.org/jira/browse/
HADOOP-2006
Project: Hadoop
Issue Type: Sub-task
Components: contrib/hbase
Affects Versions: 0.14.1
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Minor
Fix For: 0.16.0
Aggregation functions on collections of data values: average,
minimum, maximum, sum, count.
Group rows by value of an columnfamily and apply aggregate
function independently to each group of rows.
* <Grouping columnfamilies> ƒ ~function_list~ (Relation)
{code}
select producer, avg(year) from movieLog_table group by producer
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.