Re: [FastBit-users] Group by

K. John Wu Fri, 17 Aug 2012 14:56:55 -0700

Hi, Michael,

Thanks for your interest in FastBit.  I would encourage you to give
FastBit a try.  As for your specific example,


SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
IncidentBeginMonth, Priority

Here are a couple things to note:

- FastBit does not have explicit support for date and time, you will
need to convert that to something else

- You can specify the "group-by" clause and the "order-by" clause as
is to the command line tool ibis.  It should behave as you expect.

Due to the presence of function 'count(*)' in the select clause, a
natural interpretation is that the user wants to count the number of
entries for each distinct combinations of IndidentBeginMonth and
Priority.  In this case, the group by clause in redundant - it
specifies the same information already implied in the select clause.
The order by clause tell the system in what order to output the
results.  The default is to order the columns as they appeared in the
select clause - your example illustrate this default behavior.
Therefore, the order by clause could be neglected as well.

Hope this helps.

John



On 8/17/12 1:34 PM, Michael Beauregard wrote:
> Hi all,
> 
> I'm building an analytical iOS application that currently uses sqlite
> to slice and dice about 2.5MB (30K rows) of data. I've been able to
> get performance to an acceptable level by relying upon many sql
> indices, one for each slice/dice combination. Moreover, the queries
> would still be too slow unless I ensure each query has its own
> "covering" index. This approach has resulted in an explosion in the db
> size due to the vast indexing required for performance. A 2.5MB
> dataset becomes 45MB when sufficiently indexed. This strategy has
> taken us pretty far, but is at its performance and scalability limits.
> 
> My customer recently informed me that we now need to go well
> beyond (at least 10x) our initial scale requirement. Plus he
> apparently needs to add more ways to slice and dice the data,
> basically leading to further explosion of the already unacceptable sql
> index size. This essentially means that I'm not able to continue with
> my current architecture and I'm looking for other options.
> 
> I stumbled upon FastBit today while researching various options and
> I'm really intrigued by what I see. However, I don't understand how I
> can express the required queries because of FastBit's implicit
> group-by functionality. Perhaps I'm just missing something obvious,
> but it seems like I need explicit control over the group-by clause. 
> 
> For instance, here is a query that determines the number of events
> that occurred in each month, broken down by month and priority of the
> event:
> 
> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
> IncidentBeginMonth, Priority
> 
> The result is displayed in a stacked bar chart with one bar per month
> and each bar broken down by event priority. This query executes in
> around 200ms on 30K rows on an iPad 3 which is fast enough, but only
> barely.
> 
> I would really appreciate if someone could help me figure out how to
> produce an equivalent query in FastBit. I have already imported my
> data and have been trying various queries using the ibis command, but
> I'm kind of stuck.
> 
> Thanks for taking the time to read and consider my question.
> 
> Michael
> 
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Group by

Reply via email to