[ https://issues.apache.org/jira/browse/HADOOP-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Lu resolved HADOOP-9785. ----------------------------- Resolution: Duplicate Fix Version/s: (was: 2.0.4-alpha) (was: 3.0.0) 2.3.0 > LZ4 code may need upgrade (lz4.c embedded in libHadoop is r43 18 months ago, > while latest version is r98) > --------------------------------------------------------------------------------------------------------- > > Key: HADOOP-9785 > URL: https://issues.apache.org/jira/browse/HADOOP-9785 > Project: Hadoop Common > Issue Type: Improvement > Components: io, native > Affects Versions: 3.0.0, 2.0.4-alpha > Environment: [german@localhost lz4-read-only]$ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 4 > On-line CPU(s) list: 0-3 > Thread(s) per core: 1 > Core(s) per socket: 4 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 23 > Stepping: 10 > CPU MHz: 2667.000 > BogoMIPS: 5319.82 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 2048K > NUMA node0 CPU(s): 0-3 > [german@localhost lz4-read-only]$ uname -r > 2.6.32-358.14.1.el6.x86_64 > Reporter: German Florez-Larrahondo > Priority: Minor > Fix For: 2.3.0 > > > While analyzing compression performance of different Hadoop codecs I noticed > that the LZ4 code was taken from revision 43 of > https://code.google.com/p/lz4/. The latest version is r98 and there may be > extra performance benefits we can gain from using r98. > We may involve the original LZ4 author Yann Collet on these discussions, as > the current LZ4 code includes additional algorithms and parameters. > To start the investigation, I ran preliminary experiments with the Silesia > corpus and there seems to be an improvement on throughput for compression and > decompression in the latest release when compared with r43 (haven't done > enough analysis to conclude anything statistically, but looks good). > Here is raw output using LZ4 from r43 with a SUBSET of the silesia corpus > (http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia) > File: silesia/dickens > *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) *** > Compressed 10192446 bytes into 6433123 bytes ==> 63.12% > Done in 0.07 s ==> 138.86 MB/s > *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) *** > Successfully decoded 10192446 bytes > Done in 0.02 s ==> 486.01 MB/s > File: silesia/mozilla > *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) *** > Compressed 51220480 bytes into 26379814 bytes ==> 51.50% > Done in 0.25 s ==> 195.39 MB/s > *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) *** > Successfully decoded 51220480 bytes > Done in 0.12 s ==> 407.06 MB/s > File: silesia/mr > *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) *** > Compressed 9970564 bytes into 5669268 bytes ==> 56.86% > Done in 0.04 s ==> 237.72 MB/s > *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) *** > Successfully decoded 9970564 bytes > Done in 0.02 s ==> 475.43 MB/s > File: silesia/nci > *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) *** > Compressed 33553445 bytes into 5880292 bytes ==> 17.53% > Done in 0.08 s ==> 399.99 MB/s > *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) *** > Successfully decoded 33553445 bytes > Done in 0.06 s ==> 533.32 MB/s > And here raw output of LZ4 from the latest release r98 > File: silesia/dickens > *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) *** > Loading silesia/dickens... > 1-LZ4_compress : 10192446 ->^M1-LZ4_compress : 10192446 -> > 6434313 (63.13%), 172.3 MB/s > 1-LZ4_decompress_fast : 10192446 ->^M1-LZ4_decompress_fast : 10192446 -> > 676.0 MB/s^MLZ4_decompress_fast : 10192446 -> 676.0 MB/s > File: silesia/mozilla > *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) *** > Loading silesia/mozilla... > 1-LZ4_compress : 51220480 ->^M1-LZ4_compress : 51220480 -> > 26382113 (51.51%), 281.7 MB/s > 1-LZ4_decompress_fast : 51220480 ->^M1-LZ4_decompress_fast : 51220480 -> > 1003.1 MB/s^MLZ4_decompress_fast : 51220480 -> 1003.1 MB/s > File: silesia/mr > *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) *** > Loading silesia/mr... > 1-LZ4_compress : 9970564 ->^M1-LZ4_compress : 9970564 -> > 5669255 (56.86%), 268.3 MB/s > 1-LZ4_decompress_fast : 9970564 ->^M1-LZ4_decompress_fast : 9970564 -> > 788.7 MB/s^MLZ4_decompress_fast : 9970564 -> 788.7 MB/s > File: silesia/nci > *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) *** > Loading silesia/nci... > 1-LZ4_compress : 33553445 ->^M1-LZ4_compress : 33553445 -> > 5883923 (17.54%), 584.9 MB > 1-LZ4_decompress_fast : 33553445 ->^M1-LZ4_decompress_fast : 33553445 -> > 1208.3 MB/s^MLZ4_decompress_fast : 33553445 -> 1208.3 MB/s -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira