[
https://issues.apache.org/jira/browse/DERBY-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844275#comment-13844275
]
Knut Anders Hatlen commented on DERBY-5416:
-------------------------------------------
Thanks for the feedback, Dag.
You're right that 3) would probably give better accuracy. I don't think it's
particularly expensive, and in any case I think we'd only call
estimateMemoryUsage() on a single row each time the buffer is full, so it won't
be called frequently.
The downsides with 3) are:
- It will underestimate the memory requirements until DERBY-4620 is fixed,
which might lead to OOME. (I've been reluctant to fix the estimates in
DERBY-4620, because that might degrade the performance of existing applications
unless we increase the max hash table size at the same time.)
- With variable-width data types, the rows may vary greatly in size, so the
estimate may be way off with that approach too if the sample row is not close
to the average row size.
I experimented with 2), but it has some weaknesses too. The buffer will indeed
grow more slowly with that approach, but if the heap was completely full when
the sort buffer was initialized (like it is in this case), it simply means that
the buffer will grow slowly until it has filled up the heap again. So it is
very likely to eventually hit the ceiling and fail with OOME (and Ramin's test
case does still fail with OOME when this approach is used).
Another experiment I ran, one that looks more promising, is a simplified
variant of 1). It changes the meaning of beginTotalMemory and beginFreeMemory
so that they represent the low-water mark of the memory usage. In the common
case, where the memory usage grows as the buffer fills up, they have the same
meaning as today. But once it detects that the memory usage is lower than when
the buffer was initialized, those fields are changed to the current state.
Although this doesn't make the estimates completely accurate (they still
underestimate the actual memory requirement), it makes them more accurate than
they currently are. Ramin's test case succeeds when this approach is used.
> SYSCS_COMPRESS_TABLE causes an OutOfMemoryError when the heap is full at call
> time and then gets mostly garbage collected later on
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-5416
> URL: https://issues.apache.org/jira/browse/DERBY-5416
> Project: Derby
> Issue Type: Bug
> Components: Store
> Affects Versions: 10.6.2.1, 10.7.1.1, 10.8.1.2
> Reporter: Ramin Baradari
> Assignee: Knut Anders Hatlen
> Priority: Critical
> Labels: derby_triage10_9
> Attachments: compress_test_5416.patch, lowmem-test.diff
>
>
> When compressing a table with an index that is larger than the maximum heap
> size and therefore cannot be hold in memory as a whole an OutOfMemoryError
> can occur.
> For this to happen the heap usage must be close to the maximum heap size at
> the start of the index recreation and then while the entries are sorted a
> garbage collection run must clean out most of the heap. This can happen
> because a concurrent process releases a huge chunk of memory or just because
> the buffer of a previous table compression has not yet been garbage
> collected.
> The internally used heuristics to guess when more memory can be used for the
> merge inserter estimates that more memory is available and then the sort
> buffer gets doubled. The buffer size gets doubled until the heap usage is
> back to the level when the merge inserter was first initialized or when the
> OOM occurs.
> The problem lies in MergeInsert.insert(...). The check if the buffer can be
> doubled contains the expression "estimatedMemoryUsed < 0" where
> estimatedMemoryUsed is the difference in current heap usage and heap usage at
> initialization. Unfortunately, in the aforementioned scenario this will be
> true until the heap usage will reach close to maximum heap size before
> doubling the buffer size will be stopped.
> I've tested it with 10.6.2.1, 10.7.1.1 and 10.8.1.2 but the actual bug most
> likely exists in prior versions too.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)