[ 
https://issues.apache.org/jira/browse/ACCUMULO-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067364#comment-16067364
 ] 

Adam Fuchs commented on ACCUMULO-4669:
--------------------------------------

On further analysis, this is happening within inverted indexes as well, and 
doesn't rely on the table structure we use with large rows. It seems to me that 
the statistics involved here tailored towards gaussian distributed key lengths, 
but this use of standard deviation is not appropriate when looking for outliers 
in our key length distribution. Not sure exactly how to model it better, but 
we're thinking of restricting the keyLenStats to one RFile block (clearing it 
when closing a block) and making the maxBlockSize a harder limit. It might also 
be appropriate to do some comparisons of key and prevKey to get a bit of a 
prediction as to what the decision would be for the next key.

Overall, an approach that used some objective function to balance between index 
size and block size with extremely low probability of unbounded block size 
might be best here. Maybe look at something like the ratio of prevKey size to 
rawSize of the block and flush when it's under a threshold?

> RFile can create very large blocks when key statistics are not uniform
> ----------------------------------------------------------------------
>
>                 Key: ACCUMULO-4669
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4669
>             Project: Accumulo
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.7.2, 1.7.3, 1.8.0, 1.8.1
>            Reporter: Adam Fuchs
>            Assignee: Keith Turner
>            Priority: Blocker
>             Fix For: 1.7.4, 1.8.2, 2.0.0
>
>
> RFile.Writer.append checks for giant keys and avoid writing them as index 
> blocks. This check is flawed and can result in multi-GB blocks. In our case, 
> a 20GB compressed RFile had one block with over 2GB raw size. This happened 
> because the key size statistics changed after some point in the file. The 
> code in question follows:
> {code}
>     private boolean isGiantKey(Key k) {
>       // consider a key thats more than 3 standard deviations from previously 
> seen key sizes as giant
>       return k.getSize() > keyLenStats.getMean() + 
> keyLenStats.getStandardDeviation() * 3;
>     }
> ...
>       if (blockWriter == null) {
>         blockWriter = fileWriter.prepareDataBlock();
>       } else if (blockWriter.getRawSize() > blockSize) {
>         ...
>         if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() 
> > maxBlockSize) && !isGiantKey(prevKey)) {
>           closeBlock(prevKey, false);
> ...
> {code}
> Before closing a block that has grown beyond the target block size we check 
> to see that the key is below average in size or that the block is 1.1 times 
> the target block size (maxBlockSize), and we check that the key isn't a 
> "giant" key, or more than 3 standard deviations from the mean of keys seen so 
> far.
> Our RFiles often have one row of data with different column families 
> representing various forward and inverted indexes. This is a table design 
> similar to the WikiSearch example. The first column family in this case had 
> very uniform, relatively small key sizes. This first column family comprised 
> gigabytes of data, split up into roughly 100KB blocks. When we switched to 
> the next column family the keys grew in size, but were still under about 100 
> bytes. The statistics of the first column family had firmly established a 
> smaller mean and tiny standard deviation (approximately 0), and it took over 
> 2GB of larger keys to bring the standard deviation up enough so that keys 
> were no longer considered "giant" and the block could be closed.
> Now that we're aware, we see large blocks (more than 10x the target block 
> size) in almost every RFile we write. This only became a glaring problem when 
> we got OOM exceptions trying to decompress the block, but it also shows up in 
> a number of subtle performance problems, like high variance in latencies for 
> looking up particular keys.
> The fix for this should produce bounded RFile block sizes, limited to the 
> greater of 2x the maximum key/value size in the block and some configurable 
> threshold, such as 1.1 times the compressed block size. We need a firm cap to 
> be able to reason about memory usage in various applications.
> The following code produces arbitrarily large RFile blocks:
> {code}
>   FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, 
> fs, conf, acuconf);
>   writer.startDefaultLocalityGroup();
>   SummaryStatistics keyLenStats = new SummaryStatistics();
>   Random r = new Random();
>   byte [] buffer = new byte[minRowSize]; 
>   for(int i = 0; i < 100000; i++) {
>     byte [] valBytes = new byte[valLength];
>     r.nextBytes(valBytes);
>     r.nextBytes(buffer);
>     ByteBuffer.wrap(buffer).putInt(i);
>     Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, 
> 0, 0, emptyBytes, 0, 0, 0);
>     Value v = new Value(valBytes);
>     writer.append(k, v);
>     keyLenStats.addValue(k.getSize());
>     int newBufferSize = Math.max(buffer.length, (int) 
> Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 + 
> 0.0001));
>     buffer = new byte[newBufferSize];
>     if(keyLenStats.getSum() > targetSize)
>       break;
>   }
>       writer.close();
> {code}
> One telltale symptom of this bug is an OutOfMemoryException thrown from a 
> readahead thread with message "Requested array size exceeds VM limit". This 
> will only happen if the block cache size is big enough to hold the expected 
> raw block size, 2GB in our case. This message is rare, and really only 
> happens when allocating an array of size Integer.MAX_VALUE or 
> Integer.MAX_VALUE-1 on the hotspot JVM. Integer.MAX_VALUE happens in this 
> case due to some strange handling of raw block sizes in the BCFile code. Most 
> OutOfMemoryExceptions have different messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to