John
many thanks for your reply. I have sorted results and the search speed  
is now -50%. I will read your papers to better understand the fastbit  
internals and perhaps better use the library. In any case, I confirm  
you that thanks to your work I have been able to dramatically improve  
performance with respect to SQL databases. I encourage you to continue  
the developments and further improve fastbit.

Regards Luca


On Oct 6, 2009, at 1:25 AM, K. John Wu wrote:

> Dear Dr. Deri,
>
> Thanks for you interest in our software.  I would say taking 30
> seconds to read a subset from 10 million records is a little too long.
>  Here is a longer explanation.  Hope it helps.
>
> John
>
> PS: Longer explanation.
>
> The bitmap indexes are very good for counting the records satisfy
> user-specified conditions and locate the positions for these records.
>  However, actually retrieving these records does take time -- the
> bitmap indexes can not help the readings directly.  Compared with the
> alternative schemes to retrieving these records, FastBit is
> competitive in most case.  There is a published comparison on this
> (you can find this paper at
> <http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4318091> and
> <http://lbl.gov/%7Ekwu/ps/LBNL-62756.html>).  The specific comparison
> is shown in Figures 8 & 9 and discussed in Section 6.4.
>
> If you are primarily retrieving data according to the value of one or
> two variables, we recommend that you reorder the data according to
> those variables.  Otherwise, you are likely retrieving data records
> from random locations in each data file, in which cases, all the pages
> of the data files are read into memory (which is the best one can do).
>
> Short of reordering, another way to reduce the time of data retrieval
> is to retrieve less columns.  If you are going through command-line
> tools like ibis, the output records are ordered.  If you can work with
> the records in the same order as they are in the original data files,
> then directly retrieving the values with one of
> ibis::part::selectTypes (where Type is one of the concrete types, such
> as Int, Short, or Float).
>
> The bottom line is this, assuming that your records are actually
> scattered throughout the data files, then the reading time should
> dominate the time of retrieval and printing.  For 10 million records
> where each records has two int columns, the total data file size
> should be about 80 MB.  Assuming your disk system can support 10 MB/s
> reading speed, then it would take about 8 seconds to complete the
> retrieval.  10 million records should take negligible amount of time
> to sort, but may take a very significant amount of time to print to
> screen.  If you are outputing to screen, I would suggest that you
> output it to a file (e.g., with ibis -output, or redirect the screen
> output to a file).
>
>
>
> On 10/5/2009 2:40 PM, Luca Deri wrote:
>> Dear all
>> I have been using fastbit since the initial release in the field of
>> network monitoring. While I'm impressed by fastbit performance for
>> counting records that match  some criteria, when I actually request  
>> to
>> read the matching records the performance is not great.
>>
>> For instance I can count matching records in a matter of msec whereas
>> retrieving the actual data takes 30 sec (when using 10 Million
>> records) or more (data was already indexed) using ibis or similar
>> tool. I was wondering if I make some mistakes when building the
>> fastbit archives. For this reason I have built a simple program
>> (enclosed below) that I have used to create dummy data value to  
>> query.
>> Changing the input parameters, I see that the indexing speed changes
>> significantly, but the query speed is still the same.
>>
>> Question: is the obtained performance what you also expect, or did I
>> make some mistakes while building the fastbit archives?
>>
>> Thanks in advance, Luca
>>
>>
>> ----
>>
>> #include <capi.h>
>> #include <ctype.h>
>> #include <string.h>
>> #include <stdlib.h>
>>
>> /* ****************************************************** */
>>
>> void timeval_diff(struct timeval *begin, struct timeval *end, struct
>> timeval *result) {
>>   if(end->tv_sec >= begin->tv_sec) {
>>     result->tv_sec = end->tv_sec-begin->tv_sec;
>>
>>     if((end->tv_usec - begin->tv_usec) < 0) {
>>       result->tv_usec = 1000000 + end->tv_usec - begin->tv_usec;
>>       if(result->tv_usec > 1000000) begin->tv_usec = 1000000;
>>       result->tv_sec--;
>>     } else
>>       result->tv_usec = end->tv_usec-begin->tv_usec;
>>   } else
>>     result->tv_sec = 0, result->tv_usec = 0;
>> }
>>
>> void append(char *dir, int num, int total) {
>>   int *a_vals, *b_vals, i;
>>   struct timeval begin, end, diff;
>>   u_int32_t v, tot;
>>
>>   a_vals = (int*)malloc(sizeof(int)*num);
>>   b_vals = (int*)malloc(sizeof(int)*num);
>>
>>   if(a_vals && b_vals) {
>>   for (i = 0; i < num; i++)
>>     a_vals[i] = i, b_vals[i] = i;
>>
>>   gettimeofday (& begin, NULL);
>>   for(i=0; i<total; i++) {
>>     fastbit_add_values("a", "int", a_vals, num, 0);
>>     fastbit_add_values("b", "int", b_vals, num, 0);
>>
>>     fastbit_flush_buffer(dir);
>>   }
>>   gettimeofday (& end, NULL);
>>   timeval_diff (& begin, & end, & diff);
>>
>>   v = diff.tv_sec*1000 + diff.tv_usec/1000;
>>   tot = i*num;
>>
>>   printf("written %d records on disk [%.2f sec][%.2f insert/sec]\n",
>>       tot, (float)v/1000, (float)tot*1000/v);
>>   }
>> }
>>
>> int main(int argc, char *argv[]) {
>>   if(argc != 3) {
>>
>>     printf("fastbit_test <records x shot> <num shots>\n");
>>     return(0);
>>   }
>>
>>   append("fb", atoi(argv[1]), atoi(argv[2]));
>>
>>   return(0);
>> }
>>
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to