As Jon indicated, it will take a major overhaul in order for FastBit to make effective use of compressed data. Having the storage layer (like ZFS) to compress separately might be a reasonable way to get the benefit of both worlds.
In general, there is a flurry of active research efforts on compressing base data in a database system, the best known example might be vertica. There is a free version called c-store, but the development work on that has terminated. John On 4/25/12 9:58 AM, Jon Strabala wrote: > Hi Petr [and John], > > On Wed, Apr 25, 2012 at 1:18 AM, Thorgrin <[email protected] > <mailto:[email protected]>> wrote: > > Hi John, > > I've just stumbled upon a LZO data compression library > (http://www.oberhumer.com/opensource/lzo/), which is used for example > in nfdump, to cut down the space needed to store data. I wonder > whether you ever considered adding some realtime > compression/decompression to FastBit, since you work with a lot o > data. I know that the indexes use a compression, but the data is not. > > How difficult would it be to have columns optionally compressed when > writing to disk (maybe indicated by some file extension) and > decompressed when needed to answer a query (which might even not be > necessary if we look at the indexes first)? > > > I have thought of doing this. > > Of corse compressing the entire data column makes no sense as you would > need to decompress it up to the point that the data value exists. > > I think you would have to compress by blocks, an example (size is just > picked > out of the air) > > bytes 0 to 10240 (exclusive) would be block 0 > bytes 10240 to 20480 (exclusive) would be block 1 > * > > * > > The each individual data block could be compressed, and only > decompressed on need if the block is referenced for data. > > And then the MMAP and or file system offet code (to look up data) > would need to be modified to support compression, plus you would > want an LRU and/or also a HOT in memory category cache. > > An alternative (and much easier method) would be to store the DATA > on a compressed file system and the index on a non-compressed > file system. Thus would require two (2) directories one for index and > one for data, but IMHO this would be an easier change. My thought > here is to use Solaris 10, SE11, or an illumos based distro like > OpenIndiana and utilize the ZFS file system (allowing compression > on the "data" directory, but not allowing compression on the "index" > directory). > > I do not know if anyone needs such a feature, but I can see some > benefits in it for our usecase. > > > FYI, I have put standard fastbit on a compressed ZFS directory (both > data and index > in the default hierarchy) and it is still performant , I think this is > due to much less > IO e.g. a factor of 2-3x reading data since it is compressed (and done > by block) > transparent to the application. ZFS on an illumos kernel supports the > following: > > on | off | lzjb | gzip | gzip-[1-9] | zle > > > A LZO implementation could be added, but I find that lzjb works very > good. My point > is that this "takes care of" block level compression transparently and > also does > caching with any free memory (automatically released if the system or > application > needs it). Thus my comment earlier about adding a flag to fastbit to > split up index > and data on different file paths (and thus ZFS file systems). > > I was also thinking of digging into the ZFS code itself and making a > "low level" > set of hooks for data storage in fastbit. However this assumes that I > get some > free time. > > > Regards, > Petr > _______________________________________________ > FastBit-users mailing list > [email protected] <mailto:[email protected]> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > Regards, > > Jon Strabala, CTO > Quantum Systems Integrators, Inc. > 950 South Coast Drive, Suite 120 > Costa Mesa, CA 92626 > > [email protected] > http://www.QuantumSI.com <http://www.quantumsi.com/> > phone 714 428 1133 > fax 714 428 1131 > mobile 714 240 3083 _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
