On May 14, 2013, at 12:10 AM, Shane Ambler <free...@shaneware.biz> wrote:

> When it comes to disk compression I think people overlook the fact that
> it can impact on more than one level.

Compression has effects at multiple levels:

1) CPU resources to compress (and decompress) the data
2) Disk space used
3) I/O to/from disks

> The size of disks these days means that compression doesn't make a big
> difference to storage capacity for most people and 4k blocks mean little
> change in final disk space used.

        The 4K block issue is *huge* if the majority of your data is less than 
4K files. It is also large when you consider that a 5K file will not occupy 8K 
on disk. I am not a UFS on FreeBSD expert, but UFS on Solaris uses a default 
block size of 4K but has a fragment size of 1K. So files are stored on disk 
with 1K resolution (so to speak). By going to a 4K minimum block size you are 
forcing all data up to the next 4K boundary.

        Now, if the majority of your data is in large files (1MB or more), then 
the 4K minimum black size probably gets lost in the noise.

        The other factor is the actual compressibility of the data. Most media 
files (JPEG, MPEG, GIF, PNG, MP3, AAC, etc.) are already compressed and trying 
to compress them again is not likely to garner any real reduction inn size. In 
my experience with the default compression algorithm (lzjb), even uncompressed 
audio files (.AIFF or .WAV) do not compress enough to make the CPU overhead 
worthwhile.

> One thing people seem to miss is the fact that compressed files are
> going to reduce the amount of data sent through the bottle neck that is
> the wire between motherboard and drive. While a 3k file compressed to 1k
> still uses a 4k block on disk it does (should) reduce the true data
> transferred to disk. Given a 9.1 source tree using 865M, if it
> compresses to 400M then it is going to reduce the time to read the
> entire tree during compilation. This would impact a 32 thread build more
> than a 4 thread build.

        If the data does not compress well, then you get hit with the CPU 
overhead of compression to no bandwidth or space benefit. How compressible is 
the source tree ? [Not a loaded question, I haven't tried to compress it]

> While it is said that compression adds little overhead, time wise,

        Compression most certainly DOES add overhead in terms of time, based on 
the speed of your CPU and how busy your system is. My home server is an HP 
Proliant Micro with a dual core AMD N36 running at 1.3 GHz. Turning on 
compression hurts performance *if* I am getting less than 1.2:1 compression 
ratio (5 drive RAIDz2 of 1TB Enterprise disks). Above that the I/O bandwidth 
reduction due to the compression makes up for the lost CPU cycles. I have 
managed servers where each case prevailed… CPU limited so compression hurt 
performance and I/O limited where compression helped performance.

> it is
> going to take time to compress the data which is going to increase
> latency. Going from a 6ms platter disk latency to a 0.2ms SSD latency
> gives a noticeable improvement to responsiveness. Adding compression is
> going to bring that back up - possibly higher than 6ms.

        Interesting point. I am not sure of the data flow through the code to 
know if compression has a defined latency component, or is just throughput 
limited by CPU cycles to do the compression.

> Together these two factors may level out the total time to read a file.
> 
> One question there is whether the zfs cache uses compressed file data
> therefore keeping the latency while eliminating the bandwidth.

        Data cached in the ZFS ARC or L2ARC is uncompressed. Data sent via zfs 
send / zfs receive is uncompressed; there had been talk of an option to send / 
receive compressed data, but I do not think it has gone anywhere.

> Personally I have compression turned off (desktop). My thought is that
> the latency added for compression would negate the bandwidth savings.
> 
> For a file server I would consider turning it on as network overhead is
> going to hide the latency.

        Once again, it all depends on the compressibility of the data, the 
available CPU resources, the speed of the CPU resources, and the I/O bandwidth 
to/from the drives.

        Note also that RAIDz (RAIDz2, RAIDz3) have their own computational 
overhead, so compression may be a bigger advantage in this case than in the 
case of a mirror, as the RAID code will have less data to process after being 
compressed.

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company

_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Reply via email to