[ 
https://issues.apache.org/jira/browse/ACCUMULO-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067358#comment-16067358
 ] 

Keith Turner commented on ACCUMULO-4669:
----------------------------------------

{quote}
2x the maximum key/value size in the block
{quote}

This is an interesting idea.   If I understand what your thinking, this 
constraint ensures each data block has a minimum number of key/values.  
Currently there is code to ensure index blocks have at least two entries.  
However there is no minimum entry check for data blocks.  I think it may be a 
good idea to add this check, but not sure if we should do that for this bugfix.

> RFile can create very large blocks when key statistics are not uniform
> ----------------------------------------------------------------------
>
>                 Key: ACCUMULO-4669
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4669
>             Project: Accumulo
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.7.2, 1.7.3, 1.8.0, 1.8.1
>            Reporter: Adam Fuchs
>            Assignee: Keith Turner
>            Priority: Blocker
>             Fix For: 1.7.4, 1.8.2, 2.0.0
>
>
> RFile.Writer.append checks for giant keys and avoid writing them as index 
> blocks. This check is flawed and can result in multi-GB blocks. In our case, 
> a 20GB compressed RFile had one block with over 2GB raw size. This happened 
> because the key size statistics changed after some point in the file. The 
> code in question follows:
> {code}
>     private boolean isGiantKey(Key k) {
>       // consider a key thats more than 3 standard deviations from previously 
> seen key sizes as giant
>       return k.getSize() > keyLenStats.getMean() + 
> keyLenStats.getStandardDeviation() * 3;
>     }
> ...
>       if (blockWriter == null) {
>         blockWriter = fileWriter.prepareDataBlock();
>       } else if (blockWriter.getRawSize() > blockSize) {
>         ...
>         if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() 
> > maxBlockSize) && !isGiantKey(prevKey)) {
>           closeBlock(prevKey, false);
> ...
> {code}
> Before closing a block that has grown beyond the target block size we check 
> to see that the key is below average in size or that the block is 1.1 times 
> the target block size (maxBlockSize), and we check that the key isn't a 
> "giant" key, or more than 3 standard deviations from the mean of keys seen so 
> far.
> Our RFiles often have one row of data with different column families 
> representing various forward and inverted indexes. This is a table design 
> similar to the WikiSearch example. The first column family in this case had 
> very uniform, relatively small key sizes. This first column family comprised 
> gigabytes of data, split up into roughly 100KB blocks. When we switched to 
> the next column family the keys grew in size, but were still under about 100 
> bytes. The statistics of the first column family had firmly established a 
> smaller mean and tiny standard deviation (approximately 0), and it took over 
> 2GB of larger keys to bring the standard deviation up enough so that keys 
> were no longer considered "giant" and the block could be closed.
> Now that we're aware, we see large blocks (more than 10x the target block 
> size) in almost every RFile we write. This only became a glaring problem when 
> we got OOM exceptions trying to decompress the block, but it also shows up in 
> a number of subtle performance problems, like high variance in latencies for 
> looking up particular keys.
> The fix for this should produce bounded RFile block sizes, limited to the 
> greater of 2x the maximum key/value size in the block and some configurable 
> threshold, such as 1.1 times the compressed block size. We need a firm cap to 
> be able to reason about memory usage in various applications.
> The following code produces arbitrarily large RFile blocks:
> {code}
>   FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, 
> fs, conf, acuconf);
>   writer.startDefaultLocalityGroup();
>   SummaryStatistics keyLenStats = new SummaryStatistics();
>   Random r = new Random();
>   byte [] buffer = new byte[minRowSize]; 
>   for(int i = 0; i < 100000; i++) {
>     byte [] valBytes = new byte[valLength];
>     r.nextBytes(valBytes);
>     r.nextBytes(buffer);
>     ByteBuffer.wrap(buffer).putInt(i);
>     Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, 
> 0, 0, emptyBytes, 0, 0, 0);
>     Value v = new Value(valBytes);
>     writer.append(k, v);
>     keyLenStats.addValue(k.getSize());
>     int newBufferSize = Math.max(buffer.length, (int) 
> Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 + 
> 0.0001));
>     buffer = new byte[newBufferSize];
>     if(keyLenStats.getSum() > targetSize)
>       break;
>   }
>       writer.close();
> {code}
> One telltale symptom of this bug is an OutOfMemoryException thrown from a 
> readahead thread with message "Requested array size exceeds VM limit". This 
> will only happen if the block cache size is big enough to hold the expected 
> raw block size, 2GB in our case. This message is rare, and really only 
> happens when allocating an array of size Integer.MAX_VALUE or 
> Integer.MAX_VALUE-1 on the hotspot JVM. Integer.MAX_VALUE happens in this 
> case due to some strange handling of raw block sizes in the BCFile code. Most 
> OutOfMemoryExceptions have different messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to