Re: HBase aggregate query

2012-09-13 Thread James Taylor
No, there's no sorted dimension. This would be a full table scan over 
40M rows. This assumes the following:

1) your regions are evenly distributed across a four node cluster
2) unique combinations of month * scene are small enough to fit into memory
3) you chunk it up on the client side and run the chunks in parallel 
(and have a final merge phase on the client)



On 09/11/2012 10:59 AM, lars hofhansl wrote:

That's when you aggregate along a sorted dimension (prefix of the key), though. 
Right?
Not sure how smart Hive is here, but if it needs to sort the data it will 
probably be slower than SQL Server for such a small data set.



- Original Message -
From: James Taylorjtay...@salesforce.com
To: user@hbase.apache.org
Cc:
Sent: Monday, September 10, 2012 5:49 PM
Subject: Re: HBase aggregate query

iwannaplay gamesfunnlearnforkids@...  writes:

Hi ,

I want to run query like

select month(eventdate),scene,count(1),sum(timespent) from eventlog
group by month(eventdate),scene

in hbase.Through hive its taking a lot of time for 40 million
records.Do we have any syntax in hbase to find its result?In sql
server it takes around 9 minutes,How long it might take in hbase??

Regards
Prabhjot



Hi,
In our internal testing using server-side coprocessors for aggregation, we've
found HBase can process these types of queries very quickly: ~10-12 seconds
using a four node cluster. You need to chunk up and parallelize the work on the
client side to get this kind of performance, though.
Regards,

James





Re: HBase aggregate query

2012-09-11 Thread James Taylor
iwannaplay games funnlearnforkids@... writes:
 
 Hi ,
 
 I want to run query like
 
 select month(eventdate),scene,count(1),sum(timespent) from eventlog
 group by month(eventdate),scene
 
 in hbase.Through hive its taking a lot of time for 40 million
 records.Do we have any syntax in hbase to find its result?In sql
 server it takes around 9 minutes,How long it might take in hbase??
 
 Regards
 Prabhjot
 
 

Hi,
In our internal testing using server-side coprocessors for aggregation, we've
found HBase can process these types of queries very quickly: ~10-12 seconds
using a four node cluster. You need to chunk up and parallelize the work on the
client side to get this kind of performance, though.
Regards,

James





Re: HBase aggregate query

2012-09-11 Thread lars hofhansl
That's when you aggregate along a sorted dimension (prefix of the key), though. 
Right?
Not sure how smart Hive is here, but if it needs to sort the data it will 
probably be slower than SQL Server for such a small data set.



- Original Message -
From: James Taylor jtay...@salesforce.com
To: user@hbase.apache.org
Cc: 
Sent: Monday, September 10, 2012 5:49 PM
Subject: Re: HBase aggregate query

iwannaplay games funnlearnforkids@... writes:
 
 Hi ,
 
 I want to run query like
 
 select month(eventdate),scene,count(1),sum(timespent) from eventlog
 group by month(eventdate),scene
 
 in hbase.Through hive its taking a lot of time for 40 million
 records.Do we have any syntax in hbase to find its result?In sql
 server it takes around 9 minutes,How long it might take in hbase??
 
 Regards
 Prabhjot
 
 

Hi,
In our internal testing using server-side coprocessors for aggregation, we've
found HBase can process these types of queries very quickly: ~10-12 seconds
using a four node cluster. You need to chunk up and parallelize the work on the
client side to get this kind of performance, though.
Regards,

James


Re: HBase aggregate query

2012-09-11 Thread Jerry Lam
Hi Prabhjot:

Can you implement this using a counter?
That is whenever you insert a row with the month(eventdate) and scene
combination, increment the associated counter by one. Note that if you have
a batch insert of N, you can increment the counter by N.

Then you can simply query the counter whenever you want the aggregated
result.

HTH,

Jerry

On Tue, Sep 11, 2012 at 1:59 PM, lars hofhansl lhofha...@yahoo.com wrote:

 That's when you aggregate along a sorted dimension (prefix of the key),
 though. Right?
 Not sure how smart Hive is here, but if it needs to sort the data it will
 probably be slower than SQL Server for such a small data set.



 - Original Message -
 From: James Taylor jtay...@salesforce.com
 To: user@hbase.apache.org
 Cc:
 Sent: Monday, September 10, 2012 5:49 PM
 Subject: Re: HBase aggregate query

 iwannaplay games funnlearnforkids@... writes:
 
  Hi ,
 
  I want to run query like
 
  select month(eventdate),scene,count(1),sum(timespent) from eventlog
  group by month(eventdate),scene
 
  in hbase.Through hive its taking a lot of time for 40 million
  records.Do we have any syntax in hbase to find its result?In sql
  server it takes around 9 minutes,How long it might take in hbase??
 
  Regards
  Prabhjot
 
 

 Hi,
 In our internal testing using server-side coprocessors for aggregation,
 we've
 found HBase can process these types of queries very quickly: ~10-12 seconds
 using a four node cluster. You need to chunk up and parallelize the work
 on the
 client side to get this kind of performance, though.
 Regards,

 James



Re: HBase aggregate query

2012-09-10 Thread Ted Yu
Hi,
Are you able to get the number you want through hive log ?

Thanks

On Mon, Sep 10, 2012 at 7:03 AM, iwannaplay games 
funnlearnfork...@gmail.com wrote:

 Hi ,

 I want to run query like

 select month(eventdate),scene,count(1),sum(timespent) from eventlog
 group by month(eventdate),scene


 in hbase.Through hive its taking a lot of time for 40 million
 records.Do we have any syntax in hbase to find its result?In sql
 server it takes around 9 minutes,How long it might take in hbase??

 Regards
 Prabhjot



Re: HBase aggregate query

2012-09-10 Thread Srinivas Mupparapu
HBase only provides CRUD operations by means of Put/Get/Delete API and
there is no built in SQL interface.

Thanks,
Srinivas M
On Sep 10, 2012 9:03 AM, iwannaplay games funnlearnfork...@gmail.com
wrote:

 Hi ,

 I want to run query like

 select month(eventdate),scene,count(1),sum(timespent) from eventlog
 group by month(eventdate),scene


 in hbase.Through hive its taking a lot of time for 40 million
 records.Do we have any syntax in hbase to find its result?In sql
 server it takes around 9 minutes,How long it might take in hbase??

 Regards
 Prabhjot



Re: HBase aggregate query

2012-09-10 Thread iwannaplay games
its taking very long

On Mon, Sep 10, 2012 at 7:34 PM, Ted Yu yuzhih...@gmail.com wrote:

 Hi,
 Are you able to get the number you want through hive log ?

 Thanks

 On Mon, Sep 10, 2012 at 7:03 AM, iwannaplay games 
 funnlearnfork...@gmail.com wrote:

  Hi ,
 
  I want to run query like
 
  select month(eventdate),scene,count(1),sum(timespent) from eventlog
  group by month(eventdate),scene
 
 
  in hbase.Through hive its taking a lot of time for 40 million
  records.Do we have any syntax in hbase to find its result?In sql
  server it takes around 9 minutes,How long it might take in hbase??
 
  Regards
  Prabhjot
 



Re: HBase aggregate query

2012-09-10 Thread Doug Meil

Hi there, if there are common questions I'd suggest creating summary
tables of the pre-aggregated results.

http://hbase.apache.org/book.html#mapreduce.example

7.2.4. HBase MapReduce Summary to HBase Example




On 9/10/12 10:03 AM, iwannaplay games funnlearnfork...@gmail.com wrote:

Hi ,

I want to run query like

select month(eventdate),scene,count(1),sum(timespent) from eventlog
group by month(eventdate),scene


in hbase.Through hive its taking a lot of time for 40 million
records.Do we have any syntax in hbase to find its result?In sql
server it takes around 9 minutes,How long it might take in hbase??

Regards
Prabhjot