[
https://issues.apache.org/jira/browse/HIVE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935682#comment-15935682
]
Sergey Shelukhin edited comment on HIVE-16278 at 3/22/17 2:31 AM:
------------------------------------------------------------------
[~gopalv] can you take a look?
metadata cache mistakengly calls reserve with no-wait flag (which is only there
for unit tests...), and then doesn't check the result. So, if the first
eviction fails, reserve will fail, but the cache will ignore it and put the
object in the map anyway; then if there's a collision in the map, it will
release the memory that has never been reserved.
Unfortunately I'm not sure that's non-exotic enough to cause the issue in your
cluster. Full LLAP log would be helpful, to examine error callstacks. Elevator
threads are never interrupted, and only check stop between the calls to read
one split; in addition location of reserve/release calls in relation to
allocate/deallocate should make such a situation (memory manager thinks we have
memory left and doesn't evict, but actually we are fully allocated, also with
plenty to evict) close to impossible; even if processing is interrupted somehow
and we lose a buffer, it should be consistently wrong with no mismatch between
manager and allocator. It would be interesting to look at errors/interrupts to
see what could have been wrong, in case there's another issue beside what this
patch fixes.
was (Author: sershe):
[~gopalv] can you take a look?
metadata cache mistakengly calls reserve with no-wait flag (which is only there
for unit tests...), and then doesn't check the result. So, if the first
eviction fails, it will ignore it and put object in the map anyway; then if
there's a collision in the map, it will release the memory that has never been
reserved.
Unfortunately I'm not sure that's non-exotic enough to cause the issue in your
cluster. Full LLAP log would be helpful, to examine error callstacks. Elevator
threads are never interrupted, and only check stop between the calls to read
one split; in addition location of reserve/release calls in relation to
allocate/deallocate should make such a situation (memory manager thinks we have
memory left and doesn't evict, but actually we are fully allocated, also with
plenty to evict) close to impossible; even if processing is interrupted somehow
and we lose a buffer, it should be consistently wrong with no mismatch between
manager and allocator. It would be interesting to look at errors/interrupts to
see what could have been wrong, in case there's another issue beside what this
patch fixes.
> LLAP: metadata cache may incorrectly decrease memory usage in mem manager
> -------------------------------------------------------------------------
>
> Key: HIVE-16278
> URL: https://issues.apache.org/jira/browse/HIVE-16278
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-16278.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)