Hi, Petr, The having clause is essentially telling DBMS to build an in-memory data table based on the traditional part of the query and then perform further evaluation on the in-memory part. The DBMS might have fancier implementations of this that requires less memory, but the logic is the same.
Regarding the memory usage concern, there are two aspects to it: the memory require to generate the intermediate data table and the intermediate data table itself. The intermediate data table is generated through a set of aggregation operations. These aggregation operations can be divided into separable and non-separable types. The operator median and count-distinct are two prime examples of non-separable ones. To evaluate these operations, one has to accumulate all the data records from different data partitions together into one place - this is very expensive in terms of memory requirement. Most of the other operations are either separable or can be transformed into separable. For example, SUM can be computed in parts, which is a good example of a separable aggregation operation. The operation AVG is not a separable operation on its own, however, it can be transformed to be SUM(....) / count(*), which can be evaluated in parts, i.e., they are separable. The existing FastBit software already does this type of conversion. The separable functions can be performed in parts and therefore require less memory. Therefore, the memory constraints during the evaluation of the aggregations should be manageable as long as you can avoid non-separable operations. The second factor in the memory concern is that the intermediate data table might be large. In your specific example, the intermediate table has only a single row. Therefore, the memory required for this table could not be a problem. In general, you can control the intermediate table size by controlling the granularity of the group-by operations. For example Select e0id2, sum(e0id1), count(*) where e0id3 > 1000000 will have as many rows as the number of distinct values of variable e0id2. It is straightforward to reduce this number by a factor of 10 by doing the following Select floor(e0id2/10), sum(e0i21), count(*) where e0id3 > 1000000 You can reduce this further if it makes sense to your application Select floor(log(e0id2)/log(2)), sum(e0i21), count(*) where e0id3 > 1000000 Hope this helps. John On 10/7/12 5:38 AM, Petr Velan wrote: > Hi John, > > so what I need to perform this kind of queries is something like > HAVING clause? Or I can just take the resulting table and run the > filter, right? But in such case I'm worried about memory requirements. > To be a little more specific, I can do > > result1 = Select sum(e0id1) as bytes, count(*) as count from testdata > where bytes = 40 > result2 = Select bytes, count from result1 where bytes = 40 > > But the result1 table needs to fit into the memory, right? So my > question is, is there any other way to perform this kind of query > without higher memory requirements? > > Thanks, > Petr > > On 6 October 2012 06:43, K. John Wu <[email protected]> wrote: >> Hi, Petr, >> >> The behavior of FastBit in this case is not graceful, however, the >> query is not well formed. Let me explain. >> >> Your query can be rewritten as >> >> Select sum(e0id1) as bytes, count(*) from testdata where bytes = 40 >> >> The basic problem is that bytes is the result of the query, but it is >> also used as part of the where clause. There is no way to evaluate >> the where clause because it depends on the result of the query. Oh, >> well, hope it makes sense. >> >> Will see what I can make the error message more obvious.. >> >> John >> >> >> On 10/5/12 1:10 AM, Petr Velan wrote: >>> Hi John, >>> >>> I decided to take advantage of named columns in select clause. I tried to >>> use >>> >>> table->select("sum(e0id1) as bytes, count(*) as fl", "bytes = 40"); >>> >>> but the result is segmentation fault. Is it possible to use named >>> columns in the where clause? >>> Please find the test data and short program attached. >>> >>> Best regards, >>> Petr >>> >>> >>> >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
