As Jon indicated, it will take a major overhaul in order for FastBit
to make effective use of compressed data.  Having the storage layer
(like ZFS) to compress separately might be a reasonable way to get the
benefit of both worlds.

In general, there is a flurry of active research efforts on
compressing base data in a database system, the best known example
might be vertica.  There is a free version called c-store, but the
development work on that has terminated.

John


On 4/25/12 9:58 AM, Jon Strabala wrote:
> Hi Petr [and John],
> 
> On Wed, Apr 25, 2012 at 1:18 AM, Thorgrin <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi John,
> 
>     I've just stumbled upon a LZO data compression library
>     (http://www.oberhumer.com/opensource/lzo/), which is used for example
>     in nfdump, to cut down the space needed to store data. I wonder
>     whether you ever considered adding some realtime
>     compression/decompression to FastBit, since you work with a lot o
>     data. I know that the indexes use a compression, but the data is not.
> 
>     How difficult would it be to have columns optionally compressed when
>     writing to disk (maybe indicated by some file extension) and
>     decompressed when needed to answer a query (which might even not be
>     necessary if we look at the indexes first)?
> 
> 
> I have thought of doing this.
> 
> Of corse compressing the entire data column makes no sense as you would 
> need to decompress it up to the point that the data value exists.
> 
> I think you would have to compress by blocks, an example (size is just
> picked 
> out of the air)
> 
>     bytes        0 to 10240 (exclusive) would be block 0
>     bytes 10240 to 20480 (exclusive) would be block 1
>       *
> 
>       *
> 
> The each individual data block could be compressed, and only 
> decompressed on need if the block is referenced for data.
> 
> And then the MMAP and or file system offet code (to look up data) 
> would need to be modified to support compression, plus you would 
> want an LRU and/or also a HOT in memory category cache.
> 
> An alternative (and much easier method) would be to store the DATA
> on a compressed file system and the index on a non-compressed 
> file system.  Thus would require two (2) directories one for index and
> one for data, but IMHO this would be an easier change.  My thought
> here is to use Solaris 10, SE11, or an illumos based distro like 
> OpenIndiana and utilize the ZFS file system (allowing compression 
> on the "data" directory, but not allowing compression on the "index" 
> directory).
> 
>     I do not know if anyone needs such a feature, but I can see some
>     benefits in it for our usecase. 
> 
> 
> FYI, I have put standard fastbit on a compressed ZFS directory (both
> data and index
> in the default hierarchy) and it is still performant , I think this is
> due to much less 
> IO e.g. a factor of 2-3x reading data since it is compressed (and done
> by block)
> transparent to the application.  ZFS on an illumos kernel supports the
> following:
> 
>     on | off | lzjb | gzip | gzip-[1-9] | zle
> 
>  
> A LZO implementation could be added, but I find that lzjb works very
> good. My point
> is that this "takes care of" block level compression transparently and
> also does
> caching with any free memory (automatically released if the system or
> application
> needs it).  Thus my comment earlier about adding a flag to fastbit to
> split up index 
> and data on different file paths (and thus ZFS file systems).
> 
> I was also thinking of digging into the ZFS code itself and making a
> "low level"
> set of hooks for data storage in fastbit.  However this assumes that I
> get some
> free time.
> 
> 
>     Regards,
>     Petr
>     _______________________________________________
>     FastBit-users mailing list
>     [email protected] <mailto:[email protected]>
>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
> 
> Regards,
>  
> Jon Strabala, CTO
> Quantum Systems Integrators, Inc.
> 950 South Coast Drive, Suite 120
> Costa Mesa, CA 92626
> 
> [email protected]
> http://www.QuantumSI.com <http://www.quantumsi.com/>
> phone  714 428 1133
> fax    714 428 1131
> mobile 714 240 3083
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to