> Discussions on LZ4 compression algorithm & open source implementations
>
> https://lz4.org/ 
>
> LZ4 1.10 Lossless Compression Algorithm Released
> LZ4 1.10 compression algorithm released with multithreading, dramatically 
> improving compression speeds by up to 8 times.
> LZ4, a widely used cross-platform open-source compression algorithm known for 
> its speed .. has just reached version 1.10.
>
> By Bobby Borisov July 22, 2024  
> https://linuxiac.com/lz4-1-10-lossless-compression-algorithm-released/
>
>
> This latest update introduces significant enhancements, particularly 
> multithreading support, which leverages 
> modern multi-core processors to accelerate compression and decompression 
> tasks in high-throughput environments......
> ...


Media item regards above announcement ...

"Latest update for 'extremely fast' compression algorithm LZ4 sprints past old 
versions"

New release does something you might have thought it already did

By Liam Proven Tue 30 Jul 2024  
https://www.theregister.com/2024/07/30/lz4_gets_much_faster/


The new version of the high-speed compression algorithm LZ4 gets a big speed 
boost – nearly an order of magnitude.

LZ4 is one of the faster compression algorithms in Linux, but the newly 
released LZ4 version 1.10 significantly raises the bar on its own forerunners. 
On some hardware, LZ4 1.10 compresses data over five and up to nearly ten times 
faster than previous releases by using multiple CPU cores in parallel.

As the release notes explain:

Multithreading is less critical for decompression, as modern NVMe drives can 
still be saturated with a single decompression thread. Nonetheless, the new 
version enhances performance by overlapping I/O operations with decompression 
processes.

Tested on a x64 Linux platform, decompressing a 5GB text file locally takes 5 
seconds with v1.9.4; this is reduced to 3 seconds in v1.10.0, corresponding to 
> +60% performance improvement.

There are multiple compression algorithms in Linux and other FOSS OSes, such as 
the recently infamous xz. There's no single "best," they are all optimized for 
different uses, some for big files, some for certain types of data, some for 
the smallest possible compressed file size, some for the smallest memory usage, 
and so on. 

The LZ4 algorithm is one of the speed-optimized ones. Its self-description on 
GitHub is "Extremely Fast Compression algorithm."

It's been around for a while. As far as the FOSS desk can tell, The Register 
first mentioned it in 2012 and it was incorporated into the Linux kernel in 
version 3.11 the following year. It was used to compress the SquashFS found on 
many Linux boot media since kernel 3.19.

The curious can read a short but dense explanation of how LZ4 works from author 
Yann Collet, who works for Facebook and is also the creator of Zstd and xxHash. 
The US sitcom Silicon Valley fictionalized his work via a character called 
Richard Hendricks.

For a speed-focused compression scheme that's over a decade old, such a big 
performance jump is unexpected. 

It does it by spreading compression over multiple CPU cores, as previously done 
in the lz4mt C++ implementation. (The author of that variant, Takayuki 
Matsuoka, contributed multiple changes to the new LZ4 release.)

LZ4 could already do over half a gigabyte per second on each core, but now, if 
you have lots of cores to throw at it, it can do substantially more. The table 
in the announcement shows an AMD 7850HS – an octo-core chip – getting seven to 
eight times faster, and an Intel i7-9700K, also with eight cores, getting 
nearly six times as quick.

For us, this release illustrates several important points. 

First, writing efficient code to exploit the parallelism of multiple processor 
cores is very hard. The parallelized lz4mt implementation was ten years ago, 
and it's remarkable that it's taken a whole decade for this change to make it 
into what is a speed-focused algorithm. 

That, in turn, is why more parts of modern OSes and apps can't and don't make 
effective use of multiple CPU cores… and that's why the number of cores in 
desktop CPUs is increasing much more slowly than in server CPUs. 

More cores can't make a single-threaded process run any more quickly, and in 
general, most common apps tend to only use a small number of threads. There's 
still no way to automatically parallelize algorithms – only very smart humans 
can do that.


As we noted earlier this year when discussing code bloat, the late great Gene 
Amdahl formalized Amdahl's Law, which notes that the performance gains from 
making code more parallel usually tops out at about 20 processors. 

We also highly recommend "The Future of Microprocessors" talk by Arm co-creator 
Sophie Wilson, in which she notes that the silicon-chip industry is unique in 
successfully selling high-volume products where the purchasers can't use most 
of them. In fact, in any modern CPU, if it were possible to turn on all of any 
processor die at once, it would burn itself out in seconds.

In the meantime, though, LZ4 1.10 means you can use a bit more occasionally. 

Alongside LZ4, another thing that made it into the Linux kernel in version 
3.11, humorously nicknamed Linux for Workgroups, was zswap, which can compress 
data before it's swapped out to virtual memory. 

As we described a couple of years ago, turning on zswap can really help the 
performance of any Linux box that uses swap heavily. 

When version 1.10 of LZ4 makes it into the kernel, that will get faster still, 
but in the meantime, you can easily turn it on and enjoy the result today. ®

--

_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link

Reply via email to