[jira] [Commented] (DERBY-5416) SYSCS_COMPRESS_TABLE causes an OutOfMemoryError when the heap is full at call time and then gets mostly garbage collected later on

Knut Anders Hatlen (JIRA) Mon, 09 Dec 2013 07:02:03 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843219#comment-13843219
 ]


Knut Anders Hatlen commented on DERBY-5416:
-------------------------------------------

The code that decides whether or not to grow the sort buffer, essentially works 
like this in the failing case:

- When the sort buffer is initialized, it records the amount of memory 
currently in use, and allocates a small buffer.

- When the buffer is full, it checks the amount of memory currently in use. It 
intends to use the difference between the current usage and the initial usage 
as an estimate of how much memory a doubling of the sort buffer requires. 
However, since a gc has happened, the difference is negative. Since there is 
more memory available now than when the buffer was initialized, it assumes that 
it is safe to allocate as much extra space now as the amount that it 
successfully allocated with less available memory. So it doubles the buffer 
size. This sounds like a fair assumption.

- The next time the buffer is full, it still sees that the memory usage is 
smaller than the initial memory usage. Again it assumes that it is safe to 
double the buffer size, and does exactly that. However, at this point, the 
assumption is not as fair. Notice the difference between the assumption in this 
step and in the previous step: In the previous step, it was assumed safe to 
grow the buffer with as much space as we added when the buffer was initialized. 
In this step, we don't grow the buffer by the same amount as we initially gave 
the buffer; we actually grow it by twice that amount. This step is repeated 
each time the buffer gets full, and each time the amount we add gets doubled 
(way beyond the initial amount that we regarded as a safe increment). 
Eventually, the buffer gets too large for the heap, and we get an OOME.

I see at least three ways we could improve the heuristic to avoid this problem:

1. Instead of using the difference between the current memory usage and the 
initial memory usage for estimating the memory requirements, we could use the 
difference between the current memory usage and the memory usage the previous 
time the buffer was doubled. Then a big gc right after the allocation of the 
buffer won't affect all upcoming estimates, only the estimate calculated the 
first time the buffer is full.

2. When we don't have an estimate of the memory requirement for doubling the 
buffer (because of a gc), and the current memory usage is smaller than the 
initial memory usage, don't assume blindly that it is OK to double the buffer. 
Instead, grow it by the amount of memory that we found it was safe to add 
initially, when the memory usage was at least as high as it is now. This would 
mean a doubling of the buffer the first time the buffer gets full, but less 
than that from the second time the buffer gets full. (In the common case, where 
we do have an estimate of the memory usage, a doubling will happen each time 
the buffer gets full, as long as the estimate suggests there's enough free heap 
space.) In other words, use a more conservative approach and grow the buffer 
more slowly when we don't have a good estimate for the actual memory 
requirements.

3. Since the buffer contains arrays of DataValueDescriptors, we may be able to 
estimate the memory requirements the same way as we do for 
BackingStoreHashtable. That is, by calling estimateMemoryUsage() on the 
DataValueDescriptors to see approximately how much space a single row takes. 
(Currently, this approach underestimates the actual memory requirements. See 
DERBY-4620.)

> SYSCS_COMPRESS_TABLE causes an OutOfMemoryError when the heap is full at call 
> time and then gets mostly garbage collected later on
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5416
>                 URL: https://issues.apache.org/jira/browse/DERBY-5416
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.6.2.1, 10.7.1.1, 10.8.1.2
>            Reporter: Ramin Baradari
>            Priority: Critical
>              Labels: derby_triage10_9
>         Attachments: compress_test_5416.patch
>
>
> When compressing a table with an index that is larger than the maximum heap 
> size and therefore cannot be hold in memory as a whole an OutOfMemoryError 
> can occur. 
> For this to happen the heap usage must be close to the maximum heap size at 
> the start of the index recreation and then while the entries are sorted a 
> garbage collection run must clean out most of the heap. This can happen 
> because a concurrent process releases a huge chunk of memory or just because 
> the buffer of a previous table compression has not yet been garbage 
> collected. 
> The internally used heuristics to guess when more memory can be used for the 
> merge inserter estimates that more memory is available and then the sort 
> buffer gets doubled. The buffer size gets doubled until the heap usage is 
> back to the level when the merge inserter was first initialized or when the 
> OOM occurs.
> The problem lies in MergeInsert.insert(...). The check if the buffer can be 
> doubled contains the expression "estimatedMemoryUsed < 0" where 
> estimatedMemoryUsed is the difference in current heap usage and heap usage at 
> initialization. Unfortunately, in the aforementioned scenario this will be 
> true until the heap usage will reach close to maximum heap size before 
> doubling the buffer size will be stopped.
> I've tested it with 10.6.2.1, 10.7.1.1 and 10.8.1.2 but the actual bug most 
> likely exists in prior versions too.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (DERBY-5416) SYSCS_COMPRESS_TABLE causes an OutOfMemoryError when the heap is full at call time and then gets mostly garbage collected later on

Reply via email to