No, there's no sorted dimension. This would be a full table scan over 40M rows. This assumes the following:
1) your regions are evenly distributed across a four node cluster
2) unique combinations of month * scene are small enough to fit into memory
3) you chunk it up on the client side and run the chunks in parallel (and have a final merge phase on the client)


On 09/11/2012 10:59 AM, lars hofhansl wrote:
That's when you aggregate along a sorted dimension (prefix of the key), though. 
Right?
Not sure how smart Hive is here, but if it needs to sort the data it will 
probably be slower than SQL Server for such a small data set.



----- Original Message -----
From: James Taylor<jtay...@salesforce.com>
To: user@hbase.apache.org
Cc:
Sent: Monday, September 10, 2012 5:49 PM
Subject: Re: HBase aggregate query

iwannaplay games<funnlearnforkids@...>  writes:
Hi ,

I want to run query like

select month(eventdate),scene,count(1),sum(timespent) from eventlog
group by month(eventdate),scene

in hbase.Through hive its taking a lot of time for 40 million
records.Do we have any syntax in hbase to find its result?In sql
server it takes around 9 minutes,How long it might take in hbase??

Regards
Prabhjot


Hi,
In our internal testing using server-side coprocessors for aggregation, we've
found HBase can process these types of queries very quickly: ~10-12 seconds
using a four node cluster. You need to chunk up and parallelize the work on the
client side to get this kind of performance, though.
Regards,

James


Reply via email to