[
https://issues.apache.org/jira/browse/HADOOP-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke Lu resolved HADOOP-9785.
-----------------------------
Resolution: Duplicate
Fix Version/s: (was: 2.0.4-alpha)
(was: 3.0.0)
2.3.0
> LZ4 code may need upgrade (lz4.c embedded in libHadoop is r43 18 months ago,
> while latest version is r98)
> ---------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-9785
> URL: https://issues.apache.org/jira/browse/HADOOP-9785
> Project: Hadoop Common
> Issue Type: Improvement
> Components: io, native
> Affects Versions: 3.0.0, 2.0.4-alpha
> Environment: [german@localhost lz4-read-only]$ lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 4
> On-line CPU(s) list: 0-3
> Thread(s) per core: 1
> Core(s) per socket: 4
> Socket(s): 1
> NUMA node(s): 1
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 23
> Stepping: 10
> CPU MHz: 2667.000
> BogoMIPS: 5319.82
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 2048K
> NUMA node0 CPU(s): 0-3
> [german@localhost lz4-read-only]$ uname -r
> 2.6.32-358.14.1.el6.x86_64
> Reporter: German Florez-Larrahondo
> Priority: Minor
> Fix For: 2.3.0
>
>
> While analyzing compression performance of different Hadoop codecs I noticed
> that the LZ4 code was taken from revision 43 of
> https://code.google.com/p/lz4/. The latest version is r98 and there may be
> extra performance benefits we can gain from using r98.
> We may involve the original LZ4 author Yann Collet on these discussions, as
> the current LZ4 code includes additional algorithms and parameters.
> To start the investigation, I ran preliminary experiments with the Silesia
> corpus and there seems to be an improvement on throughput for compression and
> decompression in the latest release when compared with r43 (haven't done
> enough analysis to conclude anything statistically, but looks good).
> Here is raw output using LZ4 from r43 with a SUBSET of the silesia corpus
> (http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia)
> File: silesia/dickens
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 10192446 bytes into 6433123 bytes ==> 63.12%
> Done in 0.07 s ==> 138.86 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 10192446 bytes
> Done in 0.02 s ==> 486.01 MB/s
> File: silesia/mozilla
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 51220480 bytes into 26379814 bytes ==> 51.50%
> Done in 0.25 s ==> 195.39 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 51220480 bytes
> Done in 0.12 s ==> 407.06 MB/s
> File: silesia/mr
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 9970564 bytes into 5669268 bytes ==> 56.86%
> Done in 0.04 s ==> 237.72 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 9970564 bytes
> Done in 0.02 s ==> 475.43 MB/s
> File: silesia/nci
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 33553445 bytes into 5880292 bytes ==> 17.53%
> Done in 0.08 s ==> 399.99 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 33553445 bytes
> Done in 0.06 s ==> 533.32 MB/s
> And here raw output of LZ4 from the latest release r98
> File: silesia/dickens
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/dickens...
> 1-LZ4_compress : 10192446 ->^M1-LZ4_compress : 10192446 ->
> 6434313 (63.13%), 172.3 MB/s
> 1-LZ4_decompress_fast : 10192446 ->^M1-LZ4_decompress_fast : 10192446 ->
> 676.0 MB/s^MLZ4_decompress_fast : 10192446 -> 676.0 MB/s
> File: silesia/mozilla
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/mozilla...
> 1-LZ4_compress : 51220480 ->^M1-LZ4_compress : 51220480 ->
> 26382113 (51.51%), 281.7 MB/s
> 1-LZ4_decompress_fast : 51220480 ->^M1-LZ4_decompress_fast : 51220480 ->
> 1003.1 MB/s^MLZ4_decompress_fast : 51220480 -> 1003.1 MB/s
> File: silesia/mr
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/mr...
> 1-LZ4_compress : 9970564 ->^M1-LZ4_compress : 9970564 ->
> 5669255 (56.86%), 268.3 MB/s
> 1-LZ4_decompress_fast : 9970564 ->^M1-LZ4_decompress_fast : 9970564 ->
> 788.7 MB/s^MLZ4_decompress_fast : 9970564 -> 788.7 MB/s
> File: silesia/nci
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/nci...
> 1-LZ4_compress : 33553445 ->^M1-LZ4_compress : 33553445 ->
> 5883923 (17.54%), 584.9 MB
> 1-LZ4_decompress_fast : 33553445 ->^M1-LZ4_decompress_fast : 33553445 ->
> 1208.3 MB/s^MLZ4_decompress_fast : 33553445 -> 1208.3 MB/s
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira