Re: [FastBit-users] Group by

K. John Wu Mon, 20 Aug 2012 06:50:03 -0700

Hi, Michael,

FastBit has an older set of querying functions based on the class
ibis::query.  This class represents the query results using a class
called ibis::bundle, which has the information about which rows are in
a particular aggregate of group by.  Unfortunately, the newer querying
functions are not using this class anymore, but directly production
ibis::table objects.  In particular, the function
ibis::bundle::getRIDs return the RID of all rows in a bundle (which
another name for an aggregate in a group by).  The same function is
also available through the ibis::query class as the function
getRIDsInBundle.  The the block in examples/ibis.cpp starting at line
3690 has an example of using this functionality if you are interested.


I just looked at the web page about group_concat, it requires a
variable name as its argument. Do you have an explicit ID for each row
of you data?  If that is the case, an easy alternative to get the
information returned by group_concat would be to issue a query (for
each aggregate) with equality conditions on the group-by keys and
select on the specific column.  Alternatively, sqlite might have an
implicit representation of row IDs, in which case, we would like to
hear how they are doing it and might want to replicate the functionality.

John


On 8/19/12 9:57 PM, Michael Beauregard wrote:
> John, Thanks for your help in getting me off the ground.
> 
> It turned out that some columns were imported from csv incorrectly and
> my initial experiment queries just happened to use the columns that
> had import problems. Ultimately, this just caused confusing query
> results at a time when I was trying to build my understanding of how
> to use FastBit. Now that I have the csv import cleaned up, I am seeing
> exactly what I expect for almost everything I need.
> 
> I have one query that relies upon the non-standard 'group_concat'
> aggregation function as documented at
> http://www.sqlite.org/lang_aggfunc.html. It allows me to determine
> which rows were used in each aggregated value which is required for
> one particular feature of my application. Is it possible to get
> similar information from FastBit?
> 
> Thanks!
> 
> Michael
> 
> On Fri, Aug 17, 2012 at 3:19 PM, K. John Wu <[email protected]> wrote:
>> Hi, Michael,
>>
>> The command line tool ibis always expects a where clause.  In a pinch,
>> you can try something like "where 1 = 1".  Your query "SELECT
>> IncidentBeginMonth, Priority, count(*)" needs to be written as "SELECT
>> IncidentBeginMonth, Priority, count(*) WHERE 1 = 1".
>>
>> John
>>
>>
>>
>> On 8/17/12 3:15 PM, Michael Beauregard wrote:
>>> Thanks for your quick reply. I plan to convert the IncidentBeginTime
>>> to unix epoch timestamps in my source data anyway - which should also
>>> work with FastBit.
>>>
>>> Based on your input, I've simplified my query to be: SELECT
>>> IncidentBeginMonth, Priority, count(*)
>>>
>>> Executing this results in the following unhelpful error:
>>>
>>> tableSelect:: select(IncidentBeginMonth, Priority, count(*), ) failed
>>> on table T-era
>>>
>>> I cranked up the logging (-v=10), but I don't see any details that
>>> give me any clue as to what is wrong.
>>>
>>> Thanks again,
>>>
>>> Michael
>>>
>>>
>>> On Fri, Aug 17, 2012 at 2:56 PM, K. John Wu <[email protected]> wrote:
>>>>
>>>> Hi, Michael,
>>>>
>>>> Thanks for your interest in FastBit.  I would encourage you to give
>>>> FastBit a try.  As for your specific example,
>>>>
>>>> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
>>>> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
>>>> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
>>>> IncidentBeginMonth, Priority
>>>>
>>>> Here are a couple things to note:
>>>>
>>>> - FastBit does not have explicit support for date and time, you will
>>>> need to convert that to something else
>>>>
>>>> - You can specify the "group-by" clause and the "order-by" clause as
>>>> is to the command line tool ibis.  It should behave as you expect.
>>>>
>>>> Due to the presence of function 'count(*)' in the select clause, a
>>>> natural interpretation is that the user wants to count the number of
>>>> entries for each distinct combinations of IndidentBeginMonth and
>>>> Priority.  In this case, the group by clause in redundant - it
>>>> specifies the same information already implied in the select clause.
>>>> The order by clause tell the system in what order to output the
>>>> results.  The default is to order the columns as they appeared in the
>>>> select clause - your example illustrate this default behavior.
>>>> Therefore, the order by clause could be neglected as well.
>>>>
>>>> Hope this helps.
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>> On 8/17/12 1:34 PM, Michael Beauregard wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm building an analytical iOS application that currently uses sqlite
>>>>> to slice and dice about 2.5MB (30K rows) of data. I've been able to
>>>>> get performance to an acceptable level by relying upon many sql
>>>>> indices, one for each slice/dice combination. Moreover, the queries
>>>>> would still be too slow unless I ensure each query has its own
>>>>> "covering" index. This approach has resulted in an explosion in the db
>>>>> size due to the vast indexing required for performance. A 2.5MB
>>>>> dataset becomes 45MB when sufficiently indexed. This strategy has
>>>>> taken us pretty far, but is at its performance and scalability limits.
>>>>>
>>>>> My customer recently informed me that we now need to go well
>>>>> beyond (at least 10x) our initial scale requirement. Plus he
>>>>> apparently needs to add more ways to slice and dice the data,
>>>>> basically leading to further explosion of the already unacceptable sql
>>>>> index size. This essentially means that I'm not able to continue with
>>>>> my current architecture and I'm looking for other options.
>>>>>
>>>>> I stumbled upon FastBit today while researching various options and
>>>>> I'm really intrigued by what I see. However, I don't understand how I
>>>>> can express the required queries because of FastBit's implicit
>>>>> group-by functionality. Perhaps I'm just missing something obvious,
>>>>> but it seems like I need explicit control over the group-by clause.
>>>>>
>>>>> For instance, here is a query that determines the number of events
>>>>> that occurred in each month, broken down by month and priority of the
>>>>> event:
>>>>>
>>>>> SELECT IncidentBeginMonth, Priority, count(*) FROM events WHERE
>>>>> IncidentBeginTime BETWEEN '2011-01-01 00:00:00' AND '2012-12-31
>>>>> 23:59:59'  GROUP BY IncidentBeginMonth, Priority ORDER BY
>>>>> IncidentBeginMonth, Priority
>>>>>
>>>>> The result is displayed in a stacked bar chart with one bar per month
>>>>> and each bar broken down by event priority. This query executes in
>>>>> around 200ms on 30K rows on an iPad 3 which is fast enough, but only
>>>>> barely.
>>>>>
>>>>> I would really appreciate if someone could help me figure out how to
>>>>> produce an equivalent query in FastBit. I have already imported my
>>>>> data and have been trying various queries using the ibis command, but
>>>>> I'm kind of stuck.
>>>>>
>>>>> Thanks for taking the time to read and consider my question.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Group by

Reply via email to