Re: [FastBit-users] Group by

Michael Beauregard Mon, 20 Aug 2012 21:34:01 -0700

That's fantastic! I'm currently trying to get FastBit to build as a
static library and run on my iPad before I go any further. I'll try
out your latest change right after that.


Thanks!

On Mon, Aug 20, 2012 at 9:25 PM, K. John Wu <[email protected]> wrote:
> Hi, Michael,
>
> It turned out that there is a relatively straightforward way to
> support group_concat in FastBit.  The new code is checked in as SVN
> revision 543.  Please give it a try when get the chance.
>
> Thanks for the suggestion.
>
> John
>
>
> On 8/19/12 9:57 PM, Michael Beauregard wrote:
>> John, Thanks for your help in getting me off the ground.
>>
>> It turned out that some columns were imported from csv incorrectly and
>> my initial experiment queries just happened to use the columns that
>> had import problems. Ultimately, this just caused confusing query
>> results at a time when I was trying to build my understanding of how
>> to use FastBit. Now that I have the csv import cleaned up, I am seeing
>> exactly what I expect for almost everything I need.
>>
>> I have one query that relies upon the non-standard 'group_concat'
>> aggregation function as documented at
>> http://www.sqlite.org/lang_aggfunc.html. It allows me to determine
>> which rows were used in each aggregated value which is required for
>> one particular feature of my application. Is it possible to get
>> similar information from FastBit?
>>
>> Thanks!
>>
>> Michael
>>
>> On Fri, Aug 17, 2012 at 3:19 PM, K. John Wu <[email protected]> wrote:
>>> Hi, Michael,
>>>
>>> The command line tool ibis always expects a where clause.  In a pinch,
>>> you can try something like "where 1 = 1".  Your query "SELECT
>>> IncidentBeginMonth, Priority, count(*)" needs to be written as "SELECT
>>> IncidentBeginMonth, Priority, count(*) WHERE 1 = 1".
>>>
>>> John
>>>
>>>
>>>
>>> On 8/17/12 3:15 PM, Michael Beauregard wrote:
>>>> Thanks for your quick reply. I plan to convert the IncidentBeginTime
>>>> to unix epoch timestamps in my source data anyway - which should also
>>>> work with FastBit.
>>>>
>>>> Based on your input, I've simplified my query to be: SELECT
>>>> IncidentBeginMonth, Priority, count(*)
>>>>
>>>> Executing this results in the following unhelpful error:
>>>>
>>>> tableSelect:: select(IncidentBeginMonth, Priority, count(*), ) failed
>>>> on table T-era
>>>>
>>>> I cranked up the logging (-v=10), but I don't see any details that
>>>> give me any clue as to what is wrong.
>>>>
>>>> Thanks again,
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Fri, Aug 17, 2012 at 2:56 PM, K. John Wu <[email protected]> wrote:
>>>>>
>>>>> Hi, Michael,
>>>>>
>>>>> Thanks for your interest in FastBit.  I would encourage you to give
>>>>> FastBit a try.  As for your specific example,
>>>>>
>>>>> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
>>>>> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
>>>>> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
>>>>> IncidentBeginMonth, Priority
>>>>>
>>>>> Here are a couple things to note:
>>>>>
>>>>> - FastBit does not have explicit support for date and time, you will
>>>>> need to convert that to something else
>>>>>
>>>>> - You can specify the "group-by" clause and the "order-by" clause as
>>>>> is to the command line tool ibis.  It should behave as you expect.
>>>>>
>>>>> Due to the presence of function 'count(*)' in the select clause, a
>>>>> natural interpretation is that the user wants to count the number of
>>>>> entries for each distinct combinations of IndidentBeginMonth and
>>>>> Priority.  In this case, the group by clause in redundant - it
>>>>> specifies the same information already implied in the select clause.
>>>>> The order by clause tell the system in what order to output the
>>>>> results.  The default is to order the columns as they appeared in the
>>>>> select clause - your example illustrate this default behavior.
>>>>> Therefore, the order by clause could be neglected as well.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On 8/17/12 1:34 PM, Michael Beauregard wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm building an analytical iOS application that currently uses sqlite
>>>>>> to slice and dice about 2.5MB (30K rows) of data. I've been able to
>>>>>> get performance to an acceptable level by relying upon many sql
>>>>>> indices, one for each slice/dice combination. Moreover, the queries
>>>>>> would still be too slow unless I ensure each query has its own
>>>>>> "covering" index. This approach has resulted in an explosion in the db
>>>>>> size due to the vast indexing required for performance. A 2.5MB
>>>>>> dataset becomes 45MB when sufficiently indexed. This strategy has
>>>>>> taken us pretty far, but is at its performance and scalability limits.
>>>>>>
>>>>>> My customer recently informed me that we now need to go well
>>>>>> beyond (at least 10x) our initial scale requirement. Plus he
>>>>>> apparently needs to add more ways to slice and dice the data,
>>>>>> basically leading to further explosion of the already unacceptable sql
>>>>>> index size. This essentially means that I'm not able to continue with
>>>>>> my current architecture and I'm looking for other options.
>>>>>>
>>>>>> I stumbled upon FastBit today while researching various options and
>>>>>> I'm really intrigued by what I see. However, I don't understand how I
>>>>>> can express the required queries because of FastBit's implicit
>>>>>> group-by functionality. Perhaps I'm just missing something obvious,
>>>>>> but it seems like I need explicit control over the group-by clause.
>>>>>>
>>>>>> For instance, here is a query that determines the number of events
>>>>>> that occurred in each month, broken down by month and priority of the
>>>>>> event:
>>>>>>
>>>>>> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
>>>>>> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
>>>>>> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
>>>>>> IncidentBeginMonth, Priority
>>>>>>
>>>>>> The result is displayed in a stacked bar chart with one bar per month
>>>>>> and each bar broken down by event priority. This query executes in
>>>>>> around 200ms on 30K rows on an iPad 3 which is fast enough, but only
>>>>>> barely.
>>>>>>
>>>>>> I would really appreciate if someone could help me figure out how to
>>>>>> produce an equivalent query in FastBit. I have already imported my
>>>>>> data and have been trying various queries using the ibis command, but
>>>>>> I'm kind of stuck.
>>>>>>
>>>>>> Thanks for taking the time to read and consider my question.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> FastBit-users mailing list
>>>>>> [email protected]
>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Group by

Reply via email to