Adam Fuchs created ACCUMULO-4669:
------------------------------------
Summary: RFile can create very large blocks when key statistics
are not uniform
Key: ACCUMULO-4669
URL: https://issues.apache.org/jira/browse/ACCUMULO-4669
Project: Accumulo
Issue Type: Bug
Components: core
Reporter: Adam Fuchs
Priority: Critical
RFile.Writer.append checks for giant keys and avoid writing them as index
blocks. This check is flawed and can result in multi-GB blocks. In our case, a
20GB compressed RFile had one block with over 2GB raw size. This happened
because the key size statistics changed after some point in the file. The code
in question follows:
{code}
private boolean isGiantKey(Key k) {
// consider a key thats more than 3 standard deviations from previously
seen key sizes as giant
return k.getSize() > keyLenStats.getMean() +
keyLenStats.getStandardDeviation() * 3;
}
...
if (blockWriter == null) {
blockWriter = fileWriter.prepareDataBlock();
} else if (blockWriter.getRawSize() > blockSize) {
...
if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() >
maxBlockSize) && !isGiantKey(prevKey)) {
closeBlock(prevKey, false);
...
{code}
Before closing a block that has grown beyond the target block size we check to
see that the key is below average in size or that the block is 1.1 times the
target block size (maxBlockSize), and we check that the key isn't a "giant"
key, or more than 3 standard deviations from the mean of keys seen so far.
Our RFiles often have one row of data with different column families
representing various forward and inverted indexes. This is a table design
similar to the WikiSearch example. The first column family in this case had
very uniform, relatively small key sizes. This first column family comprised
gigabytes of data, split up into roughly 100KB blocks. When we switched to the
next column family the keys grew in size, but were still under about 100 bytes.
The statistics of the first column family had firmly established a smaller mean
and tiny standard deviation (approximately 0), and it took over 2GB of larger
keys to bring the standard deviation up enough so that keys were no longer
considered "giant" and the block could be closed.
Now that we're aware, we see large blocks (more than 10x the target block size)
in almost every RFile we write. This only became a glaring problem when we got
OOM exceptions trying to decompress the block, but it also shows up in a number
of subtle performance problems, like high variance in latencies for looking up
particular keys.
The fix for this should produce bounded RFile block sizes, limited to the
greater of 2x the maximum key/value size in the block and some configurable
threshold, such as 1.1 times the compressed block size. We need a firm cap to
be able to reason about memory usage in various applications.
The following code produces arbitrarily large RFile blocks:
{code}
FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, fs,
conf, acuconf);
writer.startDefaultLocalityGroup();
SummaryStatistics keyLenStats = new SummaryStatistics();
Random r = new Random();
byte [] buffer = new byte[minRowSize];
for(int i = 0; i < 100000; i++) {
byte [] valBytes = new byte[valLength];
r.nextBytes(valBytes);
r.nextBytes(buffer);
ByteBuffer.wrap(buffer).putInt(i);
Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, 0,
0, emptyBytes, 0, 0, 0);
Value v = new Value(valBytes);
writer.append(k, v);
keyLenStats.addValue(k.getSize());
int newBufferSize = Math.max(buffer.length, (int)
Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 +
0.0001));
buffer = new byte[newBufferSize];
if(keyLenStats.getSum() > targetSize)
break;
}
writer.close();
{code}
One telltale symptom of this bug is an OutOfMemoryException thrown from a
readahead thread with message "Requested array size exceeds VM limit". This
will only happen if the block cache size is big enough to hold the expected raw
block size, 2GB in our case. This message is rare, and really only happens when
allocating an array of size Integer.MAX_VALUE or Integer.MAX_VALUE-1 on the
hotspot JVM. Integer.MAX_VALUE happens in this case due to some strange
handling of raw block sizes in the BCFile code. Most OutOfMemoryExceptions have
different messages.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)