[
https://issues.apache.org/jira/browse/ACCUMULO-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758529#comment-13758529
]
John Vines commented on ACCUMULO-351:
-------------------------------------
As a quick test to see if this is still worthwhile, I made a 574M Rfile with
the following code-
{code}
CachableBlockFile.Writer _cbw = new
CachableBlockFile.Writer(FileSystem.getLocal(new Configuration()).create(new
Path("/tmp/bigTest.rf"), false, 4096,
(short) -1, 1 << 26), "none", new Configuration());
Writer writer = new RFile.Writer(_cbw, (int) 100 * 1024, (int) 128 * 1024);
Random r = new Random();
byte[] colfb, colqb, value;
colfb = new byte[128];
colqb = new byte[128];
value = new byte[128];
String colf, colq;
Value val = new Value();
writer.startDefaultLocalityGroup();
for (int i = 0; i < 1000000; i++) {
r.nextBytes(colfb);
r.nextBytes(colqb);
colf = new String(colfb);
colq = new String(colqb);
Key k = new Key(String.format("%128d", i), colf, colq);
r.nextBytes(value);
val.set(value);
writer.append(k, val);
}
writer.close();
}
{code}
So these are uncompressed RFiles.
I then tried a few different compressions to compare it easily.
Gzip - 265M compressed (2.166 ratio), compression time 50.79s, decompression
time 4.57s
lz4 fast compression - 435M compressed (1.319 ratio), compression time 1.98s,
decompression time 0.41s
lz4 high compression - 352M compressed (1.630 ratio), compression time 29.66s,
decompression time 0.32s
lzo default compression - 398M compressed (1.442 ratio), compression time
2.24s, decompression time 1.36s
lzo fast compression - 400M compressed (1.435 ratio), compression time 2.12s,
decompression time 0.21s
Snappy - 418M compressed (1.373 ratio), compression time 4.06s, decompression
time 2.18s
Compared the others, the least compression ratio for starters. At the fastest,
it compresses a negligable amount faster then lzo but decompresses at almost
double, but it's in a low resolution area so that may not be accurate. All in
all, I say it's negligable enough that I'm not going to bother, but it would be
a good exercise for a first time contributor.
> Add support for LZ4 compression
> -------------------------------
>
> Key: ACCUMULO-351
> URL: https://issues.apache.org/jira/browse/ACCUMULO-351
> Project: Accumulo
> Issue Type: Improvement
> Reporter: John Vines
> Assignee: John Vines
> Fix For: 1.6.0
>
>
> LZ4 is like LZ0, but with better decompression rates and it's BSD license,
> which means we can incorporate it in svn. Information about it is found here
> http://code.google.com/p/lz4/ . Additionally, there exists a JNI library
> for it (and snappy, for ACCUMULO-139 ) at
> https://github.com/decster/jnicompressions . I did not find the license for
> that, but it's a potential option.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira