[ 
https://issues.apache.org/jira/browse/ACCUMULO-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758529#comment-13758529
 ] 

John Vines commented on ACCUMULO-351:
-------------------------------------

As a quick test to see if this is still worthwhile, I made a 574M Rfile with 
the following code-
{code}
CachableBlockFile.Writer _cbw = new 
CachableBlockFile.Writer(FileSystem.getLocal(new Configuration()).create(new 
Path("/tmp/bigTest.rf"), false, 4096,
        (short) -1, 1 << 26), "none", new Configuration());
    Writer writer = new RFile.Writer(_cbw, (int) 100 * 1024, (int) 128 * 1024);
    
    Random r = new Random();
    byte[] colfb, colqb, value;
    colfb = new byte[128];
    colqb = new byte[128];
    value = new byte[128];
    
    String colf, colq;
    Value val = new Value();
    writer.startDefaultLocalityGroup();
    for (int i = 0; i < 1000000; i++) {
      r.nextBytes(colfb);
      r.nextBytes(colqb);
      colf = new String(colfb);
      colq = new String(colqb);
      Key k = new Key(String.format("%128d", i), colf, colq);
      
      r.nextBytes(value);
      val.set(value);
      writer.append(k, val);
    }
    
    writer.close();
  }
{code}

So these are uncompressed RFiles.

I then tried a few different compressions to compare it easily.
Gzip - 265M compressed (2.166 ratio), compression time 50.79s, decompression 
time 4.57s
lz4 fast compression - 435M compressed (1.319 ratio), compression time 1.98s, 
decompression time 0.41s
lz4 high compression - 352M compressed (1.630 ratio), compression time 29.66s, 
decompression time 0.32s
lzo default compression - 398M compressed (1.442 ratio), compression time 
2.24s, decompression time 1.36s
lzo fast compression - 400M compressed (1.435 ratio), compression time 2.12s, 
decompression time 0.21s
Snappy - 418M compressed (1.373 ratio), compression time 4.06s, decompression 
time 2.18s

Compared the others, the least compression ratio for starters. At the fastest, 
it compresses a negligable amount faster then lzo but decompresses at almost 
double, but it's in a low resolution area so that may not be accurate. All in 
all, I say it's negligable enough that I'm not going to bother, but it would be 
a good exercise for a first time contributor.
                
> Add support for LZ4 compression
> -------------------------------
>
>                 Key: ACCUMULO-351
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-351
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: John Vines
>            Assignee: John Vines
>             Fix For: 1.6.0
>
>
> LZ4 is like LZ0, but with better decompression rates and it's BSD license, 
> which means we can incorporate it in svn. Information about it is found here 
> http://code.google.com/p/lz4/ . Additionally, there exists a JNI library 
> for it (and snappy, for ACCUMULO-139 ) at 
> https://github.com/decster/jnicompressions . I did not find the license for 
> that, but it's a potential option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to