Re: [FastBit-users] Group by

K. John Wu Fri, 17 Aug 2012 15:19:23 -0700

Hi, Michael,

The command line tool ibis always expects a where clause.  In a pinch,
you can try something like "where 1 = 1".  Your query "SELECT
IncidentBeginMonth, Priority, count(*)" needs to be written as "SELECT
IncidentBeginMonth, Priority, count(*) WHERE 1 = 1".


John



On 8/17/12 3:15 PM, Michael Beauregard wrote:
> Thanks for your quick reply. I plan to convert the IncidentBeginTime
> to unix epoch timestamps in my source data anyway - which should also
> work with FastBit.
> 
> Based on your input, I've simplified my query to be: SELECT
> IncidentBeginMonth, Priority, count(*)
> 
> Executing this results in the following unhelpful error:
> 
> tableSelect:: select(IncidentBeginMonth, Priority, count(*), ) failed
> on table T-era
> 
> I cranked up the logging (-v=10), but I don't see any details that
> give me any clue as to what is wrong.
> 
> Thanks again,
> 
> Michael
> 
> 
> On Fri, Aug 17, 2012 at 2:56 PM, K. John Wu <[email protected]> wrote:
>>
>> Hi, Michael,
>>
>> Thanks for your interest in FastBit.  I would encourage you to give
>> FastBit a try.  As for your specific example,
>>
>> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
>> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
>> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
>> IncidentBeginMonth, Priority
>>
>> Here are a couple things to note:
>>
>> - FastBit does not have explicit support for date and time, you will
>> need to convert that to something else
>>
>> - You can specify the "group-by" clause and the "order-by" clause as
>> is to the command line tool ibis.  It should behave as you expect.
>>
>> Due to the presence of function 'count(*)' in the select clause, a
>> natural interpretation is that the user wants to count the number of
>> entries for each distinct combinations of IndidentBeginMonth and
>> Priority.  In this case, the group by clause in redundant - it
>> specifies the same information already implied in the select clause.
>> The order by clause tell the system in what order to output the
>> results.  The default is to order the columns as they appeared in the
>> select clause - your example illustrate this default behavior.
>> Therefore, the order by clause could be neglected as well.
>>
>> Hope this helps.
>>
>> John
>>
>>
>>
>> On 8/17/12 1:34 PM, Michael Beauregard wrote:
>>> Hi all,
>>>
>>> I'm building an analytical iOS application that currently uses sqlite
>>> to slice and dice about 2.5MB (30K rows) of data. I've been able to
>>> get performance to an acceptable level by relying upon many sql
>>> indices, one for each slice/dice combination. Moreover, the queries
>>> would still be too slow unless I ensure each query has its own
>>> "covering" index. This approach has resulted in an explosion in the db
>>> size due to the vast indexing required for performance. A 2.5MB
>>> dataset becomes 45MB when sufficiently indexed. This strategy has
>>> taken us pretty far, but is at its performance and scalability limits.
>>>
>>> My customer recently informed me that we now need to go well
>>> beyond (at least 10x) our initial scale requirement. Plus he
>>> apparently needs to add more ways to slice and dice the data,
>>> basically leading to further explosion of the already unacceptable sql
>>> index size. This essentially means that I'm not able to continue with
>>> my current architecture and I'm looking for other options.
>>>
>>> I stumbled upon FastBit today while researching various options and
>>> I'm really intrigued by what I see. However, I don't understand how I
>>> can express the required queries because of FastBit's implicit
>>> group-by functionality. Perhaps I'm just missing something obvious,
>>> but it seems like I need explicit control over the group-by clause.
>>>
>>> For instance, here is a query that determines the number of events
>>> that occurred in each month, broken down by month and priority of the
>>> event:
>>>
>>> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
>>> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
>>> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
>>> IncidentBeginMonth, Priority
>>>
>>> The result is displayed in a stacked bar chart with one bar per month
>>> and each bar broken down by event priority. This query executes in
>>> around 200ms on 30K rows on an iPad 3 which is fast enough, but only
>>> barely.
>>>
>>> I would really appreciate if someone could help me figure out how to
>>> produce an equivalent query in FastBit. I have already imported my
>>> data and have been trying various queries using the ibis command, but
>>> I'm kind of stuck.
>>>
>>> Thanks for taking the time to read and consider my question.
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Group by

Reply via email to