[ 
https://issues.apache.org/jira/browse/HADOOP-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu resolved HADOOP-9785.
-----------------------------

       Resolution: Duplicate
    Fix Version/s:     (was: 2.0.4-alpha)
                       (was: 3.0.0)
                   2.3.0
    
> LZ4 code may need upgrade (lz4.c embedded in libHadoop is r43 18 months ago, 
> while latest version is r98)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9785
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io, native
>    Affects Versions: 3.0.0, 2.0.4-alpha
>         Environment: [german@localhost lz4-read-only]$ lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                4
> On-line CPU(s) list:   0-3
> Thread(s) per core:    1
> Core(s) per socket:    4
> Socket(s):             1
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 23
> Stepping:              10
> CPU MHz:               2667.000
> BogoMIPS:              5319.82
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              2048K
> NUMA node0 CPU(s):     0-3
> [german@localhost lz4-read-only]$ uname -r
> 2.6.32-358.14.1.el6.x86_64
>            Reporter: German Florez-Larrahondo
>            Priority: Minor
>             Fix For: 2.3.0
>
>
> While analyzing compression performance of different Hadoop codecs I noticed 
> that the LZ4 code was taken from revision 43 of 
> https://code.google.com/p/lz4/. The latest version is r98 and there may be 
> extra performance benefits we can gain from using r98. 
> We may involve the original LZ4 author Yann Collet on these discussions, as 
> the current LZ4 code includes additional algorithms and parameters. 
> To start the investigation, I ran preliminary experiments with the Silesia 
> corpus and there seems to be an improvement on throughput for compression and 
> decompression in the latest release when compared with r43 (haven't done 
> enough analysis to conclude anything statistically, but looks good).  
> Here is raw output using LZ4 from r43 with a SUBSET of the silesia corpus 
> (http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia)
> File: silesia/dickens
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 10192446 bytes into 6433123 bytes ==> 63.12%
> Done in 0.07 s ==> 138.86 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 10192446 bytes
> Done in 0.02 s ==> 486.01 MB/s
> File: silesia/mozilla
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 51220480 bytes into 26379814 bytes ==> 51.50%
> Done in 0.25 s ==> 195.39 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 51220480 bytes
> Done in 0.12 s ==> 407.06 MB/s
> File: silesia/mr
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 9970564 bytes into 5669268 bytes ==> 56.86%
> Done in 0.04 s ==> 237.72 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 9970564 bytes
> Done in 0.02 s ==> 475.43 MB/s
> File: silesia/nci
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Compressed 33553445 bytes into 5880292 bytes ==> 17.53%
> Done in 0.08 s ==> 399.99 MB/s
> *** Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
> Successfully decoded 33553445 bytes
> Done in 0.06 s ==> 533.32 MB/s
> And here raw output of LZ4 from the latest release r98
> File: silesia/dickens
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/dickens...
> 1-LZ4_compress        :  10192446 ->^M1-LZ4_compress        :  10192446 ->   
> 6434313 (63.13%),  172.3 MB/s
> 1-LZ4_decompress_fast :  10192446 ->^M1-LZ4_decompress_fast :  10192446 ->   
> 676.0 MB/s^MLZ4_decompress_fast   :  10192446 ->   676.0 MB/s
> File: silesia/mozilla
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/mozilla...
> 1-LZ4_compress        :  51220480 ->^M1-LZ4_compress        :  51220480 ->  
> 26382113 (51.51%),  281.7 MB/s
> 1-LZ4_decompress_fast :  51220480 ->^M1-LZ4_decompress_fast :  51220480 ->  
> 1003.1 MB/s^MLZ4_decompress_fast   :  51220480 ->  1003.1 MB/s
> File: silesia/mr
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/mr...
> 1-LZ4_compress        :   9970564 ->^M1-LZ4_compress        :   9970564 ->   
> 5669255 (56.86%),  268.3 MB/s
> 1-LZ4_decompress_fast :   9970564 ->^M1-LZ4_decompress_fast :   9970564 ->   
> 788.7 MB/s^MLZ4_decompress_fast   :   9970564 ->   788.7 MB/s
> File: silesia/nci
> *** Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
> Loading silesia/nci...
> 1-LZ4_compress        :  33553445 ->^M1-LZ4_compress        :  33553445 ->   
> 5883923 (17.54%),  584.9 MB
> 1-LZ4_decompress_fast :  33553445 ->^M1-LZ4_decompress_fast :  33553445 ->  
> 1208.3 MB/s^MLZ4_decompress_fast   :  33553445 ->  1208.3 MB/s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to