Well, here is an example of what you are suggesting http://wiki.postgresql.org/wiki/Bitmap_Indexes
The other alternative might be how c-store uses bitmaps <http://paperhub.s3.amazonaws.com/14d147739ca381a610b8eea771ab0c84.pdf>. It is also tightly integrated with the rest of the database storage substrate. John On 12/26/13, 3:17 PM, matt wrote: > Thanks for the response, this paper: > > http://crd-legacy.lbl.gov/~kewu/ps/LBNL-62756.pdf > > seems to suggest that there is some kind of relationship between a > Btree structure which stores > bitmaps for each distinct value, and a different structure that stores > bitmaps in a flat file ... > > it seems that if the considerations of fragmentation and > non-sequential i/o are factored in it may > be possible to improve upon the designs provided by O'Niel, to store > the bitmaps in a Btree in > such a way that it performs close enough to fastbit ? are you aware > of any such efforts ? As > suggested in the paper since RDBMS already have a very elaborate Btree > implementation, > some would be tempted to integrate fastbit into a Btree ... > > regards > matt > > > ---------------------------------------------------------------------- > *From:* K. John Wu <[email protected]> > *To:* matt <[email protected]> > *Sent:* Thursday, December 26, 2013 1:13 PM > *Subject:* Re: fastbit > > Hi, Matt, > > Thanks for your interest in our software. This is the mailing list > for FastBit questions. Please scroll down for my replies to your > questions and feel free post your further questions. > > John > > On 12/26/13, 7:37 AM, matt wrote: >> Hi, >> >> we are looking at the possibility of utilizing fastbit for our network >> data analyses. Is there an active community that we can post some >> questions to? >> >> while trying to figure out the file organization we encountered some >> documentation: >> >> --------------------------- >> http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html > <http://crd-legacy.lbl.gov/%7Ekewu/fastbit/doc/dataLoading.html> >> >> >> Files in a Data Partition >> >> In a directory containing a data partition, there are files for each >> column and the metadata file named |-part.txt|. For example, after >> building the indexes in the directory |tmp| generated by the above >> commend, we have the following files, >> >> -rw-r--r-- 1 kwu Users 402 Aug 3 20:35 -part.txt >> -rw-r--r-- 1 kwu Users 400 Aug 3 20:35 a >> -rw-r--r-- 1 kwu Users 3520 Aug 4 23:14 a.idx >> -rw-r--r-- 1 kwu Users 400 Aug 3 20:35 b >> -rw-r--r-- 1 kwu Users 3520 Aug 4 23:14 b.idx >> -rw-r--r-- 1 kwu Users 200 Aug 3 20:35 c >> -rw-r--r-- 1 kwu Users 3520 Aug 4 23:14 c.idx >> >> >> >> --------------------------- >> >> >> it seems that the bitmaps for a given column 'a' are present in file >> 'a.idx'. For low >> cardinality attribute column which has only 2 possible values >> (true/false), are both >> the bitmaps stored in a.idx ? > > Yes, the index file for a column named 'a' is in the file 'a.idx'. > Some times there are additional files with different extensions. > >> >> If so then it seems that based on the order of insertion there would be: >> -- fragmentation in the bitmaps >> -- bitmaps may not be sequentially laid out on disk, since it is not >> append only on >> the whole. >> -- with a larger cardinality this might seem to have a fragmentation >> similar to a >> Btree. >> >> is the above analyses wrong ? Are bitmaps for multiple attribute >> values present in >> the same file ? This could reduce sequential layout of data >> > > Your understanding is correct. The bitmap indexes we built is > generally known as secondary indexes -- they do not restructure the > base data records. As you can imagine, reorder the data records might > be able to reduce the index sizes. However, by reordering to reducing > the index sizes of one column, you might increase the index sizes for > other columns. Therefore, there is some balancing acts to be > performed. FastBit does not attempt tot address this issue. > > There is one index file for on column of a data partition as you've > noticed. All bitmaps corresponding to different values of the column > are packed into that one index file. Therefore, this index file can > be large depending on carious factors such as how many different > values there are, and how easily can the bitmaps compressed. > > > >> if there is a community where such questions are better posted please >> let us know >> >> regards >> matt > > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
