Hi, Matt, Most of the methods you've mentioned can produce smaller indexes in some cases, however, they are generally more complex than WAH. It is possible that as computations become relatively cheap compared memory accesses, reducing size could reduce the overall execution time. We have been trying some of these methods off and on for a while now, but nothing definitive so far. If you are looking for a research topic for a thesis, this might be an interesting one. If you are looking for us to produce another compression method to replace WAH, it might be a long while;-)
John On 3/13/14, 2:40 PM, matt wrote: > I had a small question regarding the WAH compression scheme. > > There is mention of some alternate schemes such as BBC, PLWAH, > CONCISE, COMPAX, > > some of the literature claims that they have better performance and or > compression. > Is that generally true ? or is it true only depending upon the order > of input data ? > Is it possible to integrate these schemes into fastbit ? > > regards > > > ---------------------------------------------------------------------- > *From:* K. John Wu <[email protected]> > *To:* matt <[email protected]>; FastBit Users > <[email protected]> > *Sent:* Thursday, December 26, 2013 3:46 PM > *Subject:* Re: fastbit > > Well, here is an example of what you are suggesting > > http://wiki.postgresql.org/wiki/Bitmap_Indexes > > The other alternative might be how c-store uses bitmaps > <http://paperhub.s3.amazonaws.com/14d147739ca381a610b8eea771ab0c84.pdf>. > It is also tightly integrated with the rest of the database storage > substrate. > > John > > > > On 12/26/13, 3:17 PM, matt wrote: >> Thanks for the response, this paper: >> >> http://crd-legacy.lbl.gov/~kewu/ps/LBNL-62756.pdf > <http://crd-legacy.lbl.gov/%7Ekewu/ps/LBNL-62756.pdf> >> >> seems to suggest that there is some kind of relationship between a >> Btree structure which stores >> bitmaps for each distinct value, and a different structure that stores >> bitmaps in a flat file ... >> >> it seems that if the considerations of fragmentation and >> non-sequential i/o are factored in it may >> be possible to improve upon the designs provided by O'Niel, to store >> the bitmaps in a Btree in >> such a way that it performs close enough to fastbit ? are you aware >> of any such efforts ? As >> suggested in the paper since RDBMS already have a very elaborate Btree >> implementation, >> some would be tempted to integrate fastbit into a Btree ... >> >> regards >> matt >> >> >> ---------------------------------------------------------------------- >> *From:* K. John Wu <[email protected] <mailto:[email protected]>> >> *To:* matt <[email protected] <mailto:[email protected]>> >> *Sent:* Thursday, December 26, 2013 1:13 PM >> *Subject:* Re: fastbit >> >> Hi, Matt, >> >> Thanks for your interest in our software. This is the mailing list >> for FastBit questions. Please scroll down for my replies to your >> questions and feel free post your further questions. >> >> John >> >> On 12/26/13, 7:37 AM, matt wrote: >>> Hi, >>> >>> we are looking at the possibility of utilizing fastbit for our network >>> data analyses. Is there an active community that we can post some >>> questions to? >>> >>> while trying to figure out the file organization we encountered some >>> documentation: >>> >>> --------------------------- >>> http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html > <http://crd-legacy.lbl.gov/%7Ekewu/fastbit/doc/dataLoading.html> >> <http://crd-legacy.lbl.gov/%7Ekewu/fastbit/doc/dataLoading.html> > >>> >>> >>> Files in a Data Partition >>> >>> In a directory containing a data partition, there are files for each >>> column and the metadata file named |-part.txt|. For example, after >>> building the indexes in the directory |tmp| generated by the above >>> commend, we have the following files, >>> >>> -rw-r--r-- 1 kwu Users 402 Aug 3 20:35 -part.txt >>> -rw-r--r-- 1 kwu Users 400 Aug 3 20:35 a >>> -rw-r--r-- 1 kwu Users 3520 Aug 4 23:14 a.idx >>> -rw-r--r-- 1 kwu Users 400 Aug 3 20:35 b >>> -rw-r--r-- 1 kwu Users 3520 Aug 4 23:14 b.idx >>> -rw-r--r-- 1 kwu Users 200 Aug 3 20:35 c >>> -rw-r--r-- 1 kwu Users 3520 Aug 4 23:14 c.idx >>> >>> >>> >>> --------------------------- >>> >>> >>> it seems that the bitmaps for a given column 'a' are present in file >>> 'a.idx'. For low >>> cardinality attribute column which has only 2 possible values >>> (true/false), are both >>> the bitmaps stored in a.idx ? >> >> Yes, the index file for a column named 'a' is in the file 'a.idx'. >> Some times there are additional files with different extensions. >> >>> >>> If so then it seems that based on the order of insertion there > would be: >>> -- fragmentation in the bitmaps >>> -- bitmaps may not be sequentially laid out on disk, since it is not >>> append only on >>> the whole. >>> -- with a larger cardinality this might seem to have a fragmentation >>> similar to a >>> Btree. >>> >>> is the above analyses wrong ? Are bitmaps for multiple attribute >>> values present in >>> the same file ? This could reduce sequential layout of data >>> >> >> Your understanding is correct. The bitmap indexes we built is >> generally known as secondary indexes -- they do not restructure the >> base data records. As you can imagine, reorder the data records might >> be able to reduce the index sizes. However, by reordering to reducing >> the index sizes of one column, you might increase the index sizes for >> other columns. Therefore, there is some balancing acts to be >> performed. FastBit does not attempt tot address this issue. >> >> There is one index file for on column of a data partition as you've >> noticed. All bitmaps corresponding to different values of the column >> are packed into that one index file. Therefore, this index file can >> be large depending on carious factors such as how many different >> values there are, and how easily can the bitmaps compressed. >> >> >> >>> if there is a community where such questions are better posted please >>> let us know >>> >>> regards >>> matt >> >> > > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
