Thanks for you interest in FastBit.  My responses follow your
questions below.

John


On 10/20/14 8:20 PM, wang jilong wrote:
> Hi John,
> 
> I have a couple questions on FastBit memory usages and policy. My question is 
> based on an example "Table" person (person_id, age, gender, class, 
> mailing_addr). Indexed columns are age, gender, class.
> 
> 
> 1. If a query where-clause is [gender="M"], will only the bitmap index for 
> "M" be loaded into memory (if they are not loaded into memory yet) to 
> evaluate the query?
>    The other choice, though I guess not likely,  is to load all bitmap index 
> for gender, including [gender="M"] and "[gender="F"].

If nothing has been loaded into memory yet, FastBit will attempt to
load the bitmap for "M" only.  This could be done through memory map
or read.

> 
> 2. If a query where-clause is [gender="M" and age=20], will only the bitmap 
> index for gender+age be loaded into memory to evaluate the query?
>    The other choice, though I guess not likely,  is to load all bitmap index, 
> include "gender", "age" and "class".

In this particular example, I believe only the bitmaps will be used,
assuming the columns named gender and age are indexed.

In a more general case, the decision on whether to use of index or the
raw data is based on the expected cost using the data size as the proxy.


> 3. If the query is: select person_id, mailing_addr where age=30.
>    If n rows meets the query-criteria, how does it read the data from 
> data-file, in term of memory usage:
>    a. does it read the whole file into RAM, then pick up these n rows? For a 
> large data file, it will overflow the physical memory easily.
>    b. does it read n rows (person_id, mailing_addr only) into RAM, thus the 
> max memory usage is what is large enough to hold these n rows' data?
>    c. it there are large number of rows that meets the criteria, does it 
> provide a way to read 1 row at a time, with memory usage only for 1 row?
>       (one possible application is to output these rows into a file, with 
> minimum memory required)

The decision on how to read is largely based on what is expected to be
more time-efficient.  The actual decision is of course a set of
heuristics and there is no guarantee the decision actually taken is
indeed the most time efficient.


> 4. When indexes are built for a CVS file, is index built one column at a 
> time, or one row at a time? If it is one column at a time, the memory needed 
> to build the current column can be re-used to build the next column.

The index is built one column at a time.

> 
> 5. When one column index is built for a CVS file, does it need memory to 
> store all indexes for the column before they are all saved to the index file?

Yes, all bitmaps and associated metadata for an index of a column must
be able to fit in memory.  Of course, after the index is written to
file, the memory can be reused.

If your data table is too large, we recommend you to partition the data.
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to