Adam Fuchs created ACCUMULO-4669:
------------------------------------

             Summary: RFile can create very large blocks when key statistics 
are not uniform
                 Key: ACCUMULO-4669
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4669
             Project: Accumulo
          Issue Type: Bug
          Components: core
            Reporter: Adam Fuchs
            Priority: Critical


RFile.Writer.append checks for giant keys and avoid writing them as index 
blocks. This check is flawed and can result in multi-GB blocks. In our case, a 
20GB compressed RFile had one block with over 2GB raw size. This happened 
because the key size statistics changed after some point in the file. The code 
in question follows:

{code}
    private boolean isGiantKey(Key k) {
      // consider a key thats more than 3 standard deviations from previously 
seen key sizes as giant
      return k.getSize() > keyLenStats.getMean() + 
keyLenStats.getStandardDeviation() * 3;
    }
...
      if (blockWriter == null) {
        blockWriter = fileWriter.prepareDataBlock();
      } else if (blockWriter.getRawSize() > blockSize) {
        ...
        if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() > 
maxBlockSize) && !isGiantKey(prevKey)) {
          closeBlock(prevKey, false);
...
{code}

Before closing a block that has grown beyond the target block size we check to 
see that the key is below average in size or that the block is 1.1 times the 
target block size (maxBlockSize), and we check that the key isn't a "giant" 
key, or more than 3 standard deviations from the mean of keys seen so far.

Our RFiles often have one row of data with different column families 
representing various forward and inverted indexes. This is a table design 
similar to the WikiSearch example. The first column family in this case had 
very uniform, relatively small key sizes. This first column family comprised 
gigabytes of data, split up into roughly 100KB blocks. When we switched to the 
next column family the keys grew in size, but were still under about 100 bytes. 
The statistics of the first column family had firmly established a smaller mean 
and tiny standard deviation (approximately 0), and it took over 2GB of larger 
keys to bring the standard deviation up enough so that keys were no longer 
considered "giant" and the block could be closed.

Now that we're aware, we see large blocks (more than 10x the target block size) 
in almost every RFile we write. This only became a glaring problem when we got 
OOM exceptions trying to decompress the block, but it also shows up in a number 
of subtle performance problems, like high variance in latencies for looking up 
particular keys.

The fix for this should produce bounded RFile block sizes, limited to the 
greater of 2x the maximum key/value size in the block and some configurable 
threshold, such as 1.1 times the compressed block size. We need a firm cap to 
be able to reason about memory usage in various applications.

The following code produces arbitrarily large RFile blocks:
{code}
  FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, fs, 
conf, acuconf);
  writer.startDefaultLocalityGroup();
  SummaryStatistics keyLenStats = new SummaryStatistics();
  Random r = new Random();
  byte [] buffer = new byte[minRowSize]; 
  for(int i = 0; i < 100000; i++) {
    byte [] valBytes = new byte[valLength];
    r.nextBytes(valBytes);
    r.nextBytes(buffer);
    ByteBuffer.wrap(buffer).putInt(i);
    Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, 0, 
0, emptyBytes, 0, 0, 0);
    Value v = new Value(valBytes);
    writer.append(k, v);
    keyLenStats.addValue(k.getSize());
    int newBufferSize = Math.max(buffer.length, (int) 
Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 + 
0.0001));
    buffer = new byte[newBufferSize];
    if(keyLenStats.getSum() > targetSize)
      break;
  }
      writer.close();
{code}

One telltale symptom of this bug is an OutOfMemoryException thrown from a 
readahead thread with message "Requested array size exceeds VM limit". This 
will only happen if the block cache size is big enough to hold the expected raw 
block size, 2GB in our case. This message is rare, and really only happens when 
allocating an array of size Integer.MAX_VALUE or Integer.MAX_VALUE-1 on the 
hotspot JVM. Integer.MAX_VALUE happens in this case due to some strange 
handling of raw block sizes in the BCFile code. Most OutOfMemoryExceptions have 
different messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to