Hi, Min,

Thanks for your interest in FastBit.  FastBit logically works with 
data tables, where each table can be divided into multiple partitions. 
  Each data partition is stored in a directory on a file system and 
consists of a horizontal segment of a data table.  FastBit builds an 
index for each column of the data partition when instructed to do so 
or when needed.  When building an index, the projection of the column 
(including all the rows for that column) is read into memory.  The 
index is built in memory.  This is the major limitation on how large a 
data partition can be.  A rule-of-thumb we use is that the projection 
of a column shall not more than 1/10th of the physical memory 
available on the computer.  For example, on a system with 4GB of 
memory, we limit the size of each projection to 400MB.  If most of the 
columns are 4-byte integer values, each data partition has no more 
than 100 million rows.

When using the indexes for queries, FastBit will only read the bitmaps 
that are needed.  In most cases, we never read more than a half of all 
the bitmaps into memory (this is usually measured in bytes not the 
number of bitmaps, but there are exceptions).  For those cases that 
require more than a half of the bitmaps, we can evaluate the 
complement of the query condition instead (and take the complement of 
the answer afterwards).

In cases where the raw data is needed, then it will again read into 
one whole column at a time.  In the error message shown by Mian Lu, I 
presume the 600MB or so memory required are for some column projections.

Hope this helps.

John



On 10/25/2009 4:05 AM, Min Zhou wrote:
> Hi guys,
> 
> I am a newbie of fastbit. I do not know whether bitmap index is loaded 
> by fastbit into memory. if so, then the size of the bitmap index will be 
> limited by the memory size of that machine runs fastbit. Can fastbit 
> overcome that problem? If it can't, how many records at most can fastbit 
> build bitmap indices on them?
> 
> 
> Thanks,
> Min
> -- 
> My research interests are distributed systems, parallel computing and 
> bytecode based virtual machine.
> 
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to