Re: [FastBit-users] Group by

Michael Beauregard Sun, 19 Aug 2012 21:58:02 -0700

John, Thanks for your help in getting me off the ground.

It turned out that some columns were imported from csv incorrectly and
my initial experiment queries just happened to use the columns that
had import problems. Ultimately, this just caused confusing query
results at a time when I was trying to build my understanding of how
to use FastBit. Now that I have the csv import cleaned up, I am seeing
exactly what I expect for almost everything I need.


I have one query that relies upon the non-standard 'group_concat'
aggregation function as documented at
http://www.sqlite.org/lang_aggfunc.html. It allows me to determine
which rows were used in each aggregated value which is required for
one particular feature of my application. Is it possible to get
similar information from FastBit?

Thanks!

Michael

On Fri, Aug 17, 2012 at 3:19 PM, K. John Wu <[email protected]> wrote:
> Hi, Michael,
>
> The command line tool ibis always expects a where clause.  In a pinch,
> you can try something like "where 1 = 1".  Your query "SELECT
> IncidentBeginMonth, Priority, count(*)" needs to be written as "SELECT
> IncidentBeginMonth, Priority, count(*) WHERE 1 = 1".
>
> John
>
>
>
> On 8/17/12 3:15 PM, Michael Beauregard wrote:
>> Thanks for your quick reply. I plan to convert the IncidentBeginTime
>> to unix epoch timestamps in my source data anyway - which should also
>> work with FastBit.
>>
>> Based on your input, I've simplified my query to be: SELECT
>> IncidentBeginMonth, Priority, count(*)
>>
>> Executing this results in the following unhelpful error:
>>
>> tableSelect:: select(IncidentBeginMonth, Priority, count(*), ) failed
>> on table T-era
>>
>> I cranked up the logging (-v=10), but I don't see any details that
>> give me any clue as to what is wrong.
>>
>> Thanks again,
>>
>> Michael
>>
>>
>> On Fri, Aug 17, 2012 at 2:56 PM, K. John Wu <[email protected]> wrote:
>>>
>>> Hi, Michael,
>>>
>>> Thanks for your interest in FastBit.  I would encourage you to give
>>> FastBit a try.  As for your specific example,
>>>
>>> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
>>> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
>>> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
>>> IncidentBeginMonth, Priority
>>>
>>> Here are a couple things to note:
>>>
>>> - FastBit does not have explicit support for date and time, you will
>>> need to convert that to something else
>>>
>>> - You can specify the "group-by" clause and the "order-by" clause as
>>> is to the command line tool ibis.  It should behave as you expect.
>>>
>>> Due to the presence of function 'count(*)' in the select clause, a
>>> natural interpretation is that the user wants to count the number of
>>> entries for each distinct combinations of IndidentBeginMonth and
>>> Priority.  In this case, the group by clause in redundant - it
>>> specifies the same information already implied in the select clause.
>>> The order by clause tell the system in what order to output the
>>> results.  The default is to order the columns as they appeared in the
>>> select clause - your example illustrate this default behavior.
>>> Therefore, the order by clause could be neglected as well.
>>>
>>> Hope this helps.
>>>
>>> John
>>>
>>>
>>>
>>> On 8/17/12 1:34 PM, Michael Beauregard wrote:
>>>> Hi all,
>>>>
>>>> I'm building an analytical iOS application that currently uses sqlite
>>>> to slice and dice about 2.5MB (30K rows) of data. I've been able to
>>>> get performance to an acceptable level by relying upon many sql
>>>> indices, one for each slice/dice combination. Moreover, the queries
>>>> would still be too slow unless I ensure each query has its own
>>>> "covering" index. This approach has resulted in an explosion in the db
>>>> size due to the vast indexing required for performance. A 2.5MB
>>>> dataset becomes 45MB when sufficiently indexed. This strategy has
>>>> taken us pretty far, but is at its performance and scalability limits.
>>>>
>>>> My customer recently informed me that we now need to go well
>>>> beyond (at least 10x) our initial scale requirement. Plus he
>>>> apparently needs to add more ways to slice and dice the data,
>>>> basically leading to further explosion of the already unacceptable sql
>>>> index size. This essentially means that I'm not able to continue with
>>>> my current architecture and I'm looking for other options.
>>>>
>>>> I stumbled upon FastBit today while researching various options and
>>>> I'm really intrigued by what I see. However, I don't understand how I
>>>> can express the required queries because of FastBit's implicit
>>>> group-by functionality. Perhaps I'm just missing something obvious,
>>>> but it seems like I need explicit control over the group-by clause.
>>>>
>>>> For instance, here is a query that determines the number of events
>>>> that occurred in each month, broken down by month and priority of the
>>>> event:
>>>>
>>>> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
>>>> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
>>>> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
>>>> IncidentBeginMonth, Priority
>>>>
>>>> The result is displayed in a stacked bar chart with one bar per month
>>>> and each bar broken down by event priority. This query executes in
>>>> around 200ms on 30K rows on an iPad 3 which is fast enough, but only
>>>> barely.
>>>>
>>>> I would really appreciate if someone could help me figure out how to
>>>> produce an equivalent query in FastBit. I have already imported my
>>>> data and have been trying various queries using the ibis command, but
>>>> I'm kind of stuck.
>>>>
>>>> Thanks for taking the time to read and consider my question.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Group by

Reply via email to