John, > Having the storage layer (like ZFS) to compress separately > might be a reasonable way to get the benefit of both worlds.
Intuitively I think the index would need to be in another directory (file system non-compressed) and the data would be in a different directory (file system zfs-compressed). But as of now both data and index get tucked away in the same directory. I imagine I could symlink or hardlink the data files elsewhere (file system zfs compressed) this would be fine for a test but kind of a pain in a production system. How much work would it be to add an option to force the split the index and data up to different directories ? - Jon On Fri, Apr 27, 2012 at 5:12 PM, K. John Wu <[email protected]> wrote: > As Jon indicated, it will take a major overhaul in order for FastBit > to make effective use of compressed data. Having the storage layer > (like ZFS) to compress separately might be a reasonable way to get the > benefit of both worlds. > > In general, there is a flurry of active research efforts on > compressing base data in a database system, the best known example > might be vertica. There is a free version called c-store, but the > development work on that has terminated. > > John > > > On 4/25/12 9:58 AM, Jon Strabala wrote: > > Hi Petr [and John], > > > > On Wed, Apr 25, 2012 at 1:18 AM, Thorgrin <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi John, > > > > I've just stumbled upon a LZO data compression library > > (http://www.oberhumer.com/opensource/lzo/), which is used for > example > > in nfdump, to cut down the space needed to store data. I wonder > > whether you ever considered adding some realtime > > compression/decompression to FastBit, since you work with a lot o > > data. I know that the indexes use a compression, but the data is not. > > > > How difficult would it be to have columns optionally compressed when > > writing to disk (maybe indicated by some file extension) and > > decompressed when needed to answer a query (which might even not be > > necessary if we look at the indexes first)? > > > > > > I have thought of doing this. > > > > Of course compressing the entire data column makes no sense as you would > > need to decompress it up to the point that the data value exists. > > > > I think you would have to compress by blocks, an example (size is just > > picked > > out of the air) > > > > bytes 0 to 10240 (exclusive) would be block 0 > > bytes 10240 to 20480 (exclusive) would be block 1 > > * > > > > * > > > > The each individual data block could be compressed, and only > > decompressed on need if the block is referenced for data. > > > > And then the MMAP and or file system offet code (to look up data) > > would need to be modified to support compression, plus you would > > want an LRU and/or also a HOT in memory category cache. > > > > An alternative (and much easier method) would be to store the DATA > > on a compressed file system and the index on a non-compressed > > file system. Thus would require two (2) directories one for index and > > one for data, but IMHO this would be an easier change. My thought > > here is to use Solaris 10, SE11, or an illumos based distro like > > OpenIndiana and utilize the ZFS file system (allowing compression > > on the "data" directory, but not allowing compression on the "index" > > directory). > > > > I do not know if anyone needs such a feature, but I can see some > > benefits in it for our usecase. > > > > > > FYI, I have put standard fastbit on a compressed ZFS directory (both > > data and index > > in the default hierarchy) and it is still performant , I think this is > > due to much less > > IO e.g. a factor of 2-3x reading data since it is compressed (and done > > by block) > > transparent to the application. ZFS on an illumos kernel supports the > > following: > > > > on | off | lzjb | gzip | gzip-[1-9] | zle > > > > > > A LZO implementation could be added, but I find that lzjb works very > > good. My point > > is that this "takes care of" block level compression transparently and > > also does > > caching with any free memory (automatically released if the system or > > application > > needs it). Thus my comment earlier about adding a flag to fastbit to > > split up index > > and data on different file paths (and thus ZFS file systems). > > > > I was also thinking of digging into the ZFS code itself and making a > > "low level" > > set of hooks for data storage in fastbit. However this assumes that I > > get some > > free time. > > > > > > Regards, > > Petr > > _______________________________________________ > > FastBit-users mailing list > > [email protected] <mailto:[email protected]> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > Regards, > > > > Jon Strabala, CTO > > Quantum Systems Integrators, Inc. > > 950 South Coast Drive, Suite 120 > > Costa Mesa, CA 92626 > > > > [email protected] > > http://www.QuantumSI.com <http://www.quantumsi.com/> > > phone 714 428 1133 > > fax 714 428 1131 > > mobile 714 240 3083 >
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
