Re: HBase aggregate query
No, there's no sorted dimension. This would be a full table scan over 40M rows. This assumes the following: 1) your regions are evenly distributed across a four node cluster 2) unique combinations of month * scene are small enough to fit into memory 3) you chunk it up on the client side and run the chunks in parallel (and have a final merge phase on the client) On 09/11/2012 10:59 AM, lars hofhansl wrote: That's when you aggregate along a sorted dimension (prefix of the key), though. Right? Not sure how smart Hive is here, but if it needs to sort the data it will probably be slower than SQL Server for such a small data set. - Original Message - From: James Taylorjtay...@salesforce.com To: user@hbase.apache.org Cc: Sent: Monday, September 10, 2012 5:49 PM Subject: Re: HBase aggregate query iwannaplay gamesfunnlearnforkids@... writes: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot Hi, In our internal testing using server-side coprocessors for aggregation, we've found HBase can process these types of queries very quickly: ~10-12 seconds using a four node cluster. You need to chunk up and parallelize the work on the client side to get this kind of performance, though. Regards, James
Re: HBase aggregate query
iwannaplay games funnlearnforkids@... writes: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot Hi, In our internal testing using server-side coprocessors for aggregation, we've found HBase can process these types of queries very quickly: ~10-12 seconds using a four node cluster. You need to chunk up and parallelize the work on the client side to get this kind of performance, though. Regards, James
Re: HBase aggregate query
That's when you aggregate along a sorted dimension (prefix of the key), though. Right? Not sure how smart Hive is here, but if it needs to sort the data it will probably be slower than SQL Server for such a small data set. - Original Message - From: James Taylor jtay...@salesforce.com To: user@hbase.apache.org Cc: Sent: Monday, September 10, 2012 5:49 PM Subject: Re: HBase aggregate query iwannaplay games funnlearnforkids@... writes: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot Hi, In our internal testing using server-side coprocessors for aggregation, we've found HBase can process these types of queries very quickly: ~10-12 seconds using a four node cluster. You need to chunk up and parallelize the work on the client side to get this kind of performance, though. Regards, James
Re: HBase aggregate query
Hi Prabhjot: Can you implement this using a counter? That is whenever you insert a row with the month(eventdate) and scene combination, increment the associated counter by one. Note that if you have a batch insert of N, you can increment the counter by N. Then you can simply query the counter whenever you want the aggregated result. HTH, Jerry On Tue, Sep 11, 2012 at 1:59 PM, lars hofhansl lhofha...@yahoo.com wrote: That's when you aggregate along a sorted dimension (prefix of the key), though. Right? Not sure how smart Hive is here, but if it needs to sort the data it will probably be slower than SQL Server for such a small data set. - Original Message - From: James Taylor jtay...@salesforce.com To: user@hbase.apache.org Cc: Sent: Monday, September 10, 2012 5:49 PM Subject: Re: HBase aggregate query iwannaplay games funnlearnforkids@... writes: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot Hi, In our internal testing using server-side coprocessors for aggregation, we've found HBase can process these types of queries very quickly: ~10-12 seconds using a four node cluster. You need to chunk up and parallelize the work on the client side to get this kind of performance, though. Regards, James
Re: HBase aggregate query
Hi, Are you able to get the number you want through hive log ? Thanks On Mon, Sep 10, 2012 at 7:03 AM, iwannaplay games funnlearnfork...@gmail.com wrote: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot
Re: HBase aggregate query
HBase only provides CRUD operations by means of Put/Get/Delete API and there is no built in SQL interface. Thanks, Srinivas M On Sep 10, 2012 9:03 AM, iwannaplay games funnlearnfork...@gmail.com wrote: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot
Re: HBase aggregate query
its taking very long On Mon, Sep 10, 2012 at 7:34 PM, Ted Yu yuzhih...@gmail.com wrote: Hi, Are you able to get the number you want through hive log ? Thanks On Mon, Sep 10, 2012 at 7:03 AM, iwannaplay games funnlearnfork...@gmail.com wrote: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot
Re: HBase aggregate query
Hi there, if there are common questions I'd suggest creating summary tables of the pre-aggregated results. http://hbase.apache.org/book.html#mapreduce.example 7.2.4. HBase MapReduce Summary to HBase Example On 9/10/12 10:03 AM, iwannaplay games funnlearnfork...@gmail.com wrote: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot