[
https://issues.apache.org/jira/browse/DERBY-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843219#comment-13843219
]
Knut Anders Hatlen commented on DERBY-5416:
-------------------------------------------
The code that decides whether or not to grow the sort buffer, essentially works
like this in the failing case:
- When the sort buffer is initialized, it records the amount of memory
currently in use, and allocates a small buffer.
- When the buffer is full, it checks the amount of memory currently in use. It
intends to use the difference between the current usage and the initial usage
as an estimate of how much memory a doubling of the sort buffer requires.
However, since a gc has happened, the difference is negative. Since there is
more memory available now than when the buffer was initialized, it assumes that
it is safe to allocate as much extra space now as the amount that it
successfully allocated with less available memory. So it doubles the buffer
size. This sounds like a fair assumption.
- The next time the buffer is full, it still sees that the memory usage is
smaller than the initial memory usage. Again it assumes that it is safe to
double the buffer size, and does exactly that. However, at this point, the
assumption is not as fair. Notice the difference between the assumption in this
step and in the previous step: In the previous step, it was assumed safe to
grow the buffer with as much space as we added when the buffer was initialized.
In this step, we don't grow the buffer by the same amount as we initially gave
the buffer; we actually grow it by twice that amount. This step is repeated
each time the buffer gets full, and each time the amount we add gets doubled
(way beyond the initial amount that we regarded as a safe increment).
Eventually, the buffer gets too large for the heap, and we get an OOME.
I see at least three ways we could improve the heuristic to avoid this problem:
1. Instead of using the difference between the current memory usage and the
initial memory usage for estimating the memory requirements, we could use the
difference between the current memory usage and the memory usage the previous
time the buffer was doubled. Then a big gc right after the allocation of the
buffer won't affect all upcoming estimates, only the estimate calculated the
first time the buffer is full.
2. When we don't have an estimate of the memory requirement for doubling the
buffer (because of a gc), and the current memory usage is smaller than the
initial memory usage, don't assume blindly that it is OK to double the buffer.
Instead, grow it by the amount of memory that we found it was safe to add
initially, when the memory usage was at least as high as it is now. This would
mean a doubling of the buffer the first time the buffer gets full, but less
than that from the second time the buffer gets full. (In the common case, where
we do have an estimate of the memory usage, a doubling will happen each time
the buffer gets full, as long as the estimate suggests there's enough free heap
space.) In other words, use a more conservative approach and grow the buffer
more slowly when we don't have a good estimate for the actual memory
requirements.
3. Since the buffer contains arrays of DataValueDescriptors, we may be able to
estimate the memory requirements the same way as we do for
BackingStoreHashtable. That is, by calling estimateMemoryUsage() on the
DataValueDescriptors to see approximately how much space a single row takes.
(Currently, this approach underestimates the actual memory requirements. See
DERBY-4620.)
> SYSCS_COMPRESS_TABLE causes an OutOfMemoryError when the heap is full at call
> time and then gets mostly garbage collected later on
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-5416
> URL: https://issues.apache.org/jira/browse/DERBY-5416
> Project: Derby
> Issue Type: Bug
> Components: Store
> Affects Versions: 10.6.2.1, 10.7.1.1, 10.8.1.2
> Reporter: Ramin Baradari
> Priority: Critical
> Labels: derby_triage10_9
> Attachments: compress_test_5416.patch
>
>
> When compressing a table with an index that is larger than the maximum heap
> size and therefore cannot be hold in memory as a whole an OutOfMemoryError
> can occur.
> For this to happen the heap usage must be close to the maximum heap size at
> the start of the index recreation and then while the entries are sorted a
> garbage collection run must clean out most of the heap. This can happen
> because a concurrent process releases a huge chunk of memory or just because
> the buffer of a previous table compression has not yet been garbage
> collected.
> The internally used heuristics to guess when more memory can be used for the
> merge inserter estimates that more memory is available and then the sort
> buffer gets doubled. The buffer size gets doubled until the heap usage is
> back to the level when the merge inserter was first initialized or when the
> OOM occurs.
> The problem lies in MergeInsert.insert(...). The check if the buffer can be
> doubled contains the expression "estimatedMemoryUsed < 0" where
> estimatedMemoryUsed is the difference in current heap usage and heap usage at
> initialization. Unfortunately, in the aforementioned scenario this will be
> true until the heap usage will reach close to maximum heap size before
> doubling the buffer size will be stopped.
> I've tested it with 10.6.2.1, 10.7.1.1 and 10.8.1.2 but the actual bug most
> likely exists in prior versions too.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)