Dave Marion created ACCUMULO-2353:
-------------------------------------
Summary: Test improvments to java.io.InputStream.seek() for
possible Hadoop patch
Key: ACCUMULO-2353
URL: https://issues.apache.org/jira/browse/ACCUMULO-2353
Project: Accumulo
Issue Type: Task
Environment: Java 6 update 45 or later
Hadoop 2.2.0
Reporter: Dave Marion
Priority: Minor
At some point (early Java 7 I think, then backported to around Java 6 Update
45), the java.io.InputStream.seek() method was changed from reading byte[512]
to byte[2048]. The difference can be seen in DeflaterInputStream, which has not
been updated:
{noformat}
public long skip(long n) throws IOException {
if (n < 0) {
throw new IllegalArgumentException("negative skip length");
}
ensureOpen();
// Skip bytes by repeatedly decompressing small blocks
if (rbuf.length < 512)
rbuf = new byte[512];
int total = (int)Math.min(n, Integer.MAX_VALUE);
long cnt = 0;
while (total > 0) {
// Read a small block of uncompressed bytes
int len = read(rbuf, 0, (total <= rbuf.length ? total :
rbuf.length));
if (len < 0) {
break;
}
cnt += len;
total -= len;
}
return cnt;
}
{noformat}
and java.io.InputStream in Java 6 Update 45:
{noformat}
// MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to
// use when skipping.
private static final int MAX_SKIP_BUFFER_SIZE = 2048;
public long skip(long n) throws IOException {
long remaining = n;
int nr;
if (n <= 0) {
return 0;
}
int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining);
byte[] skipBuffer = new byte[size];
while (remaining > 0) {
nr = read(skipBuffer, 0, (int)Math.min(size, remaining));
if (nr < 0) {
break;
}
remaining -= nr;
}
return n - remaining;
}
{noformat}
In sample tests I saw about a 20% improvement in skip() when seeking towards
the end of a locally cached compressed file. Looking at the DecompressorStream
in HDFS, the seek method is a near copy of the old InputStream method:
{noformat}
private byte[] skipBytes = new byte[512];
@Override
public long skip(long n) throws IOException {
// Sanity checks
if (n < 0) {
throw new IllegalArgumentException("negative skip length");
}
checkStream();
// Read 'n' bytes
int skipped = 0;
while (skipped < n) {
int len = Math.min(((int)n - skipped), skipBytes.length);
len = read(skipBytes, 0, len);
if (len == -1) {
eof = true;
break;
}
skipped += len;
}
return skipped;
}
{noformat}
This task is to evaluate the changes to DecompressorStream with a possible
patch to HDFS and possible bug request to Oracle to port the InputStream.seek
changes to DeflaterInputStream.seek
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)