[
https://issues.apache.org/jira/browse/ACCUMULO-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keith Turner updated ACCUMULO-4669:
-----------------------------------
Priority: Blocker (was: Critical)
> RFile can create very large blocks when key statistics are not uniform
> ----------------------------------------------------------------------
>
> Key: ACCUMULO-4669
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4669
> Project: Accumulo
> Issue Type: Bug
> Components: core
> Affects Versions: 1.7.2, 1.7.3, 1.8.0, 1.8.1
> Reporter: Adam Fuchs
> Assignee: Keith Turner
> Priority: Blocker
> Fix For: 1.7.4, 1.8.2, 2.0.0
>
>
> RFile.Writer.append checks for giant keys and avoid writing them as index
> blocks. This check is flawed and can result in multi-GB blocks. In our case,
> a 20GB compressed RFile had one block with over 2GB raw size. This happened
> because the key size statistics changed after some point in the file. The
> code in question follows:
> {code}
> private boolean isGiantKey(Key k) {
> // consider a key thats more than 3 standard deviations from previously
> seen key sizes as giant
> return k.getSize() > keyLenStats.getMean() +
> keyLenStats.getStandardDeviation() * 3;
> }
> ...
> if (blockWriter == null) {
> blockWriter = fileWriter.prepareDataBlock();
> } else if (blockWriter.getRawSize() > blockSize) {
> ...
> if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize()
> > maxBlockSize) && !isGiantKey(prevKey)) {
> closeBlock(prevKey, false);
> ...
> {code}
> Before closing a block that has grown beyond the target block size we check
> to see that the key is below average in size or that the block is 1.1 times
> the target block size (maxBlockSize), and we check that the key isn't a
> "giant" key, or more than 3 standard deviations from the mean of keys seen so
> far.
> Our RFiles often have one row of data with different column families
> representing various forward and inverted indexes. This is a table design
> similar to the WikiSearch example. The first column family in this case had
> very uniform, relatively small key sizes. This first column family comprised
> gigabytes of data, split up into roughly 100KB blocks. When we switched to
> the next column family the keys grew in size, but were still under about 100
> bytes. The statistics of the first column family had firmly established a
> smaller mean and tiny standard deviation (approximately 0), and it took over
> 2GB of larger keys to bring the standard deviation up enough so that keys
> were no longer considered "giant" and the block could be closed.
> Now that we're aware, we see large blocks (more than 10x the target block
> size) in almost every RFile we write. This only became a glaring problem when
> we got OOM exceptions trying to decompress the block, but it also shows up in
> a number of subtle performance problems, like high variance in latencies for
> looking up particular keys.
> The fix for this should produce bounded RFile block sizes, limited to the
> greater of 2x the maximum key/value size in the block and some configurable
> threshold, such as 1.1 times the compressed block size. We need a firm cap to
> be able to reason about memory usage in various applications.
> The following code produces arbitrarily large RFile blocks:
> {code}
> FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename,
> fs, conf, acuconf);
> writer.startDefaultLocalityGroup();
> SummaryStatistics keyLenStats = new SummaryStatistics();
> Random r = new Random();
> byte [] buffer = new byte[minRowSize];
> for(int i = 0; i < 100000; i++) {
> byte [] valBytes = new byte[valLength];
> r.nextBytes(valBytes);
> r.nextBytes(buffer);
> ByteBuffer.wrap(buffer).putInt(i);
> Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes,
> 0, 0, emptyBytes, 0, 0, 0);
> Value v = new Value(valBytes);
> writer.append(k, v);
> keyLenStats.addValue(k.getSize());
> int newBufferSize = Math.max(buffer.length, (int)
> Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 +
> 0.0001));
> buffer = new byte[newBufferSize];
> if(keyLenStats.getSum() > targetSize)
> break;
> }
> writer.close();
> {code}
> One telltale symptom of this bug is an OutOfMemoryException thrown from a
> readahead thread with message "Requested array size exceeds VM limit". This
> will only happen if the block cache size is big enough to hold the expected
> raw block size, 2GB in our case. This message is rare, and really only
> happens when allocating an array of size Integer.MAX_VALUE or
> Integer.MAX_VALUE-1 on the hotspot JVM. Integer.MAX_VALUE happens in this
> case due to some strange handling of raw block sizes in the BCFile code. Most
> OutOfMemoryExceptions have different messages.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)