Re: [FastBit-users] Data compression

Jon Strabala Wed, 25 Apr 2012 17:08:01 -0700

Petr

More information on ZFS compression with a real database can be found in
the links:



http://don.blogs.smugmug.com/2008/10/13/zfs-mysqlinnodb-compression-update/
discussion "ZFS & MySQL/InnoDB Compression Update"

http://don.blogs.smugmug.com/2008/10/10/success-with-opensolaris-zfs-mysql-in-production/
"Success with OpenSolaris + ZFS + MySQL in production!"

http://palominodb.com/
"Craigslist has created patterns for workarounds to address issues such as
presplitting data to avoid chunks migrating to various shards or using ZFS
with LZO compression."


More information on using LZO in ZFS can be found in the links (adding a
new compression type - small code module - but kernel rebuild needed):


 <http://denisy.dyndns.org/lzo_vs_lzjb/>
http://denisy.dyndns.org/lzo_vs_lzjb/
discussion "mission to port LZO as an alternative to ZFS's LZJB on FreeBSD"

http://groups.google.com/group/zfs-fuse/browse_thread/thread/c1c50faa9c65b980/c6a01a69a386822c?#c6a01a69a386822c
discussion "tests and tuning on ZFS fuse"

http://groups.google.com/group/zfs-fuse/browse_thread/thread/919b21c449525f52
discussion "ZFS with LZO patch"


And one last comment lz4 also looks nice at:

http://code.google.com/p/lz4/


On Wed, Apr 25, 2012 at 9:58 AM, Jon Strabala <[email protected]>wrote:

> Hi Petr [and John],
>
> On Wed, Apr 25, 2012 at 1:18 AM, Thorgrin <[email protected]> wrote:
>
> Hi John,
>>
>> I've just stumbled upon a LZO data compression library
>> (http://www.oberhumer.com/opensource/lzo/), which is used for example
>> in nfdump, to cut down the space needed to store data. I wonder
>> whether you ever considered adding some realtime
>> compression/decompression to FastBit, since you work with a lot o
>> data. I know that the indexes use a compression, but the data is not.
>>
>> How difficult would it be to have columns optionally compressed when
>> writing to disk (maybe indicated by some file extension) and
>> decompressed when needed to answer a query (which might even not be
>> necessary if we look at the indexes first)?
>>
>>
> I have thought of doing this.
>
> Of corse compressing the entire data column makes no sense as you would
> need to decompress it up to the point that the data value exists.
>
> I think you would have to compress by blocks, an example (size is just
> picked
> out of the air)
>
> bytes        0 to 10240 (exclusive) would be block 0
> bytes 10240 to 20480 (exclusive) would be block 1
>   *
>
>   *
>
> The each individual data block could be compressed, and only
> decompressed on need if the block is referenced for data.
>
> And then the MMAP and or file system offet code (to look up data)
> would need to be modified to support compression, plus you would
> want an LRU and/or also a HOT in memory category cache.
>
> An alternative (and much easier method) would be to store the DATA
> on a compressed file system and the index on a non-compressed
> file system.  Thus would require two (2) directories one for index and
> one for data, but IMHO this would be an easier change.  My thought
> here is to use Solaris 10, SE11, or an illumos based distro like
> OpenIndiana and utilize the ZFS file system (allowing compression
> on the "data" directory, but not allowing compression on the "index"
> directory).
>
> I do not know if anyone needs such a feature, but I can see some
>> benefits in it for our usecase.
>
>
> FYI, I have put standard fastbit on a compressed ZFS directory (both data
> and index
> in the default hierarchy) and it is still performant , I think this is due
> to much less
> IO e.g. a factor of 2-3x reading data since it is compressed (and done by
> block)
> transparent to the application.  ZFS on an illumos kernel supports the
> following:
>
> on | off | lzjb | gzip | gzip-[1-9] | zle
>
>
> A LZO implementation could be added, but I find that lzjb works very good.
> My point
> is that this "takes care of" block level compression transparently and
> also does
> caching with any free memory (automatically released if the system or
> application
> needs it).  Thus my comment earlier about adding a flag to fastbit to
> split up index
> and data on different file paths (and thus ZFS file systems).
>
> I was also thinking of digging into the ZFS code itself and making a "low
> level"
> set of hooks for data storage in fastbit.  However this assumes that I get
> some
> free time.
>
>
>> Regards,
>> Petr
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>
>
> Regards,
>
> Jon Strabala, CTO
> Quantum Systems Integrators, Inc.
> 950 South Coast Drive, Suite 120
> Costa Mesa, CA 92626
>
> [email protected]
> http://www.QuantumSI.com <http://www.quantumsi.com/>
> phone  714 428 1133
> fax    714 428 1131
> mobile 714 240 3083
>

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Data compression

Reply via email to