[ 
https://issues.apache.org/jira/browse/ACCUMULO-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070417#comment-16070417
 ] 

Keith Turner edited comment on ACCUMULO-4669 at 6/30/17 5:21 PM:
-----------------------------------------------------------------

I ran into problems when running 
[webindex|https://github.com/astralway/webindex].  This application would 
insert data like the following.

{noformat}
row = <hash of url 1><url 1>   col = <Col1>
row = <hash of url 1><url 1>   col = <Col2>
row = <hash of url 1><url 1>   col = <Col3>
row = <hash of url 2><url 2>   col = <Col1>
row = <hash of url 2><url 2>   col = <Col2>
row = <hash of url 2><url 2>   col = <Col3>
row = <hash of url 3><url 3>   col = <Col1>
row = <hash of url 3><url 3>   col = <Col2>
row = <hash of url 3><url 3>   col = <Col3>
{noformat}

The application uses [Common Crawl data|http://commoncrawl.org/] so the URLs 
were derived from real web pages.  I suspect they followed a zipfian 
distribution.   There would be a very few really large urls.   Since the same 
URL was used for multiple keys (because its in the row and there are multiple 
columns) those large keys had an extremely high chance of ending up in the 
index.    When the first large key is added to a data block, it makes the block 
large.  The second large key causes the block to close and the large key to end 
up in the index.  These few large keys can quadruple the index size (which is 
bad for caching the index).  

If a megabyte row has many columns then it may end up in the index multiple 
times. The exact same row in the index multiple times prevents the shortening 
added in ACCUMULO-1124 from doing anything with the row.  It may be able to 
shorten the column, but if the column is a few bytes and the row is a megabyte 
then shortening the column does not help.

I thought about adding relative compression to the index, but the index needs 
to be decompressed in memory for binary search.   Could create a binary search 
that operates on a relative compressed index I suppose.  But would still need 
to keep the first instance of the megabyte row if using relative compression.


was (Author: kturner):
I ran into problems when running 
[webindex|https://github.com/astralway/webindex].  This application would 
insert data like the following.

{noformat}
row = <hash of url 1><url 1>   col = <Col1>
row = <hash of url 1><url 1>   col = <Col2>
{noformat}


> RFile can create very large blocks when key statistics are not uniform
> ----------------------------------------------------------------------
>
>                 Key: ACCUMULO-4669
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4669
>             Project: Accumulo
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.7.2, 1.7.3, 1.8.0, 1.8.1
>            Reporter: Adam Fuchs
>            Assignee: Keith Turner
>            Priority: Blocker
>             Fix For: 1.7.4, 1.8.2, 2.0.0
>
>
> RFile.Writer.append checks for giant keys and avoid writing them as index 
> blocks. This check is flawed and can result in multi-GB blocks. In our case, 
> a 20GB compressed RFile had one block with over 2GB raw size. This happened 
> because the key size statistics changed after some point in the file. The 
> code in question follows:
> {code}
>     private boolean isGiantKey(Key k) {
>       // consider a key thats more than 3 standard deviations from previously 
> seen key sizes as giant
>       return k.getSize() > keyLenStats.getMean() + 
> keyLenStats.getStandardDeviation() * 3;
>     }
> ...
>       if (blockWriter == null) {
>         blockWriter = fileWriter.prepareDataBlock();
>       } else if (blockWriter.getRawSize() > blockSize) {
>         ...
>         if ((prevKey.getSize() <= avergageKeySize || blockWriter.getRawSize() 
> > maxBlockSize) && !isGiantKey(prevKey)) {
>           closeBlock(prevKey, false);
> ...
> {code}
> Before closing a block that has grown beyond the target block size we check 
> to see that the key is below average in size or that the block is 1.1 times 
> the target block size (maxBlockSize), and we check that the key isn't a 
> "giant" key, or more than 3 standard deviations from the mean of keys seen so 
> far.
> Our RFiles often have one row of data with different column families 
> representing various forward and inverted indexes. This is a table design 
> similar to the WikiSearch example. The first column family in this case had 
> very uniform, relatively small key sizes. This first column family comprised 
> gigabytes of data, split up into roughly 100KB blocks. When we switched to 
> the next column family the keys grew in size, but were still under about 100 
> bytes. The statistics of the first column family had firmly established a 
> smaller mean and tiny standard deviation (approximately 0), and it took over 
> 2GB of larger keys to bring the standard deviation up enough so that keys 
> were no longer considered "giant" and the block could be closed.
> Now that we're aware, we see large blocks (more than 10x the target block 
> size) in almost every RFile we write. This only became a glaring problem when 
> we got OOM exceptions trying to decompress the block, but it also shows up in 
> a number of subtle performance problems, like high variance in latencies for 
> looking up particular keys.
> The fix for this should produce bounded RFile block sizes, limited to the 
> greater of 2x the maximum key/value size in the block and some configurable 
> threshold, such as 1.1 times the compressed block size. We need a firm cap to 
> be able to reason about memory usage in various applications.
> The following code produces arbitrarily large RFile blocks:
> {code}
>   FileSKVWriter writer = RFileOperations.getInstance().openWriter(filename, 
> fs, conf, acuconf);
>   writer.startDefaultLocalityGroup();
>   SummaryStatistics keyLenStats = new SummaryStatistics();
>   Random r = new Random();
>   byte [] buffer = new byte[minRowSize]; 
>   for(int i = 0; i < 100000; i++) {
>     byte [] valBytes = new byte[valLength];
>     r.nextBytes(valBytes);
>     r.nextBytes(buffer);
>     ByteBuffer.wrap(buffer).putInt(i);
>     Key k = new Key(buffer, 0, buffer.length, emptyBytes, 0, 0, emptyBytes, 
> 0, 0, emptyBytes, 0, 0, 0);
>     Value v = new Value(valBytes);
>     writer.append(k, v);
>     keyLenStats.addValue(k.getSize());
>     int newBufferSize = Math.max(buffer.length, (int) 
> Math.ceil(keyLenStats.getMean() + keyLenStats.getStandardDeviation() * 4 + 
> 0.0001));
>     buffer = new byte[newBufferSize];
>     if(keyLenStats.getSum() > targetSize)
>       break;
>   }
>       writer.close();
> {code}
> One telltale symptom of this bug is an OutOfMemoryException thrown from a 
> readahead thread with message "Requested array size exceeds VM limit". This 
> will only happen if the block cache size is big enough to hold the expected 
> raw block size, 2GB in our case. This message is rare, and really only 
> happens when allocating an array of size Integer.MAX_VALUE or 
> Integer.MAX_VALUE-1 on the hotspot JVM. Integer.MAX_VALUE happens in this 
> case due to some strange handling of raw block sizes in the BCFile code. Most 
> OutOfMemoryExceptions have different messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to