Hi, I'm working on a page based bitmap index which uses sparse files to RLE compress regions of the bitmap. I essentially create a linked list on disk in a very large sparse file, then allocate space in the file for the data and for each index. Because nodes in the list are at fixed location it is as if there are pointers directly into each node. Adding space between nodes is therefore O(1), as is traversing the list as it can be entered at predefined points and simply do one or two seeks to the necessary location. This is cheap for SSD. Runs of zeroes one page or longer can be representing using only sparse space. Not bits are stored on disk for runs of zero >= 1 page. Runs of ones require either one or two pages (they are 4k pages). In addition if a page goes from having no data to being full, or being full to having no data, a hole can usually be punched in the filesystem so that the SSD can better manage space.
Rather than use binning, there is a b+tree index which keeps track of the distinct values in the index, and if there is only one value, the RID is stored in the b+tree so that creating the bitmap is not necessary, and if there are >=1 and <= 512 distinct values, a list of RIDS are stored on the first index page, so bitmap pages are created on the fly until all the slots fill and the real index is materialized. Do you think binning is a better approach than this? I only mention this because I would like to use WAH for compression of regions that are not sparse, and I plan to use data compression via what I think is a unique method, one which could benefit Fastbit. Also, I haven't looked closely at Fastbit in a bit. Is there an easy way to use a Fastbit WAH index as a bit vector rather than having to store the data into fastbit partitions? For me, each column is stored as pages. Each page has a CRC32, on index pages there is a mask of the page, which represents sections that are either empty, or sections that contain some data. This can make page comparisons really fast. The compression works by taking a 1MB extent and compressing it down with PFOR, FPC, RLE or other forms of compression. Let's say the 1MB extend shrinks to 50K. At the beginning of the 1MB extent, the 50K is written and the rest of the space is reserved as sparse space. You can seek into any extent and decompress the data without having to search forward through an RLE compressed structure. Since Fastbit does not compress data (at least I don't think it does) such a mechanism might be useful to you. The same technique can be used for WAH, though I do not know if it would benefit. Reserve a chunk of sparse space on disk (1M) WAH compress 1M of data, then write the smaller amount onto the disk. Then you can still seek into that 1M extent and get the data, but the size on disk is significantly smaller. Anyway, if you think this is a good idea, please let me know. Regards,
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
