Well, here is an example of what you are suggesting

http://wiki.postgresql.org/wiki/Bitmap_Indexes

The other alternative might be how c-store uses bitmaps
<http://paperhub.s3.amazonaws.com/14d147739ca381a610b8eea771ab0c84.pdf>.
 It is also tightly integrated with the rest of the database storage
substrate.

John



On 12/26/13, 3:17 PM, matt wrote:
> Thanks for the response, this paper:
> 
> http://crd-legacy.lbl.gov/~kewu/ps/LBNL-62756.pdf
> 
> seems to suggest that there is some kind of relationship between a
> Btree structure which stores
> bitmaps for each distinct value, and a different structure that stores
> bitmaps in a flat file ...
> 
> it seems that if the considerations of fragmentation and
> non-sequential i/o are factored in it may
> be possible to improve upon the designs provided by O'Niel, to store
> the bitmaps in a Btree in
> such a way that it performs close enough to fastbit ?  are you aware
> of any such efforts ? As
> suggested in the paper since RDBMS already have a very elaborate Btree
> implementation,
> some would be tempted to integrate fastbit into a Btree ...
> 
> regards
> matt
> 
> 
> ----------------------------------------------------------------------
> *From:* K. John Wu <[email protected]>
> *To:* matt <[email protected]>
> *Sent:* Thursday, December 26, 2013 1:13 PM
> *Subject:* Re: fastbit
> 
> Hi, Matt,
> 
> Thanks for your interest in our software.  This is the mailing list
> for FastBit questions.  Please scroll down for my replies to your
> questions and feel free post your further questions.
> 
> John
> 
> On 12/26/13, 7:37 AM, matt wrote:
>> Hi,
>>
>> we are looking at the possibility of utilizing fastbit for our network
>> data analyses. Is there an active community that we can post some
>> questions to?
>>
>> while trying to figure out the file organization we encountered some
>> documentation:
>>
>> ---------------------------
>> http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html
> <http://crd-legacy.lbl.gov/%7Ekewu/fastbit/doc/dataLoading.html>
>>
>>
>>    Files in a Data Partition
>>
>> In a directory containing a data partition, there are files for each
>> column and the metadata file named |-part.txt|. For example, after
>> building the indexes in the directory |tmp| generated by the above
>> commend, we have the following files,
>>
>> -rw-r--r-- 1 kwu Users  402 Aug  3 20:35 -part.txt
>> -rw-r--r-- 1 kwu Users  400 Aug  3 20:35 a
>> -rw-r--r-- 1 kwu Users 3520 Aug  4 23:14 a.idx
>> -rw-r--r-- 1 kwu Users  400 Aug  3 20:35 b
>> -rw-r--r-- 1 kwu Users 3520 Aug  4 23:14 b.idx
>> -rw-r--r-- 1 kwu Users  200 Aug  3 20:35 c
>> -rw-r--r-- 1 kwu Users 3520 Aug  4 23:14 c.idx
>>
>>
>>
>> ---------------------------
>>
>>
>> it seems that the bitmaps for a given column 'a' are present in file
>> 'a.idx'. For low
>> cardinality attribute column which has only 2 possible values
>> (true/false), are both
>> the bitmaps stored in a.idx ?
> 
> Yes, the index file for a column named 'a' is in the file 'a.idx'.
> Some times there are additional files with different extensions.
> 
>>
>> If so then it seems that based on the order of insertion there would be:
>> -- fragmentation in the bitmaps
>> -- bitmaps may not be sequentially laid out on disk, since it is not
>> append only  on
>> the whole.
>> -- with a larger cardinality this might seem to have a fragmentation
>> similar to a
>> Btree.
>>
>> is the above analyses wrong ? Are bitmaps for multiple attribute
>> values present in
>> the same file ? This could reduce sequential layout of data
>>
> 
> Your understanding is correct.  The bitmap indexes we built is
> generally known as secondary indexes -- they do not restructure the
> base data records.  As you can imagine, reorder the data records might
> be able to reduce the index sizes.  However, by reordering to reducing
> the index sizes of one column, you might increase the index sizes for
> other columns.  Therefore, there is some balancing acts to be
> performed.  FastBit does not attempt tot address this issue.
> 
> There is one index file for on column of a data partition as you've
> noticed.  All bitmaps corresponding to different values of the column
> are packed into that one index file.  Therefore, this index file can
> be large depending on carious factors such as how many different
> values there are, and how easily can the bitmaps compressed.
> 
> 
> 
>> if there is a community where such questions are better posted please
>> let us know
>>
>> regards
>> matt
> 
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to