[jira] [Comment Edited] (HIVE-16278) LLAP: metadata cache may incorrectly decrease memory usage in mem manager

Sergey Shelukhin (JIRA) Tue, 21 Mar 2017 19:31:56 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935682#comment-15935682
 ]


Sergey Shelukhin edited comment on HIVE-16278 at 3/22/17 2:31 AM:
------------------------------------------------------------------

[~gopalv] can you take a look?
metadata cache mistakengly calls reserve with no-wait flag (which is only there 
for unit tests...), and then doesn't check the result. So, if the first 
eviction fails, reserve will fail, but the cache will ignore it and put the 
object in the map anyway; then if there's a collision in the map, it will 
release the memory that has never been reserved. 
Unfortunately I'm not sure that's non-exotic enough to cause the issue in your 
cluster. Full LLAP log would be helpful, to examine error callstacks. Elevator 
threads are never interrupted, and only check stop between the calls to read 
one split; in addition location of reserve/release calls in relation to 
allocate/deallocate should make such a situation (memory manager thinks we have 
memory left and doesn't evict, but actually we are fully allocated, also with 
plenty to evict) close to impossible; even if processing is interrupted somehow 
and we lose a buffer, it should be consistently wrong with no mismatch between 
manager and allocator. It would be interesting to look at errors/interrupts to 
see what could have been wrong, in case there's another issue beside what this 
patch fixes.


was (Author: sershe):
[~gopalv] can you take a look?
metadata cache mistakengly calls reserve with no-wait flag (which is only there 
for unit tests...), and then doesn't check the result. So, if the first 
eviction fails, it will ignore it and put object in the map anyway; then if 
there's a collision in the map, it will release the memory that has never been 
reserved. 
Unfortunately I'm not sure that's non-exotic enough to cause the issue in your 
cluster. Full LLAP log would be helpful, to examine error callstacks. Elevator 
threads are never interrupted, and only check stop between the calls to read 
one split; in addition location of reserve/release calls in relation to 
allocate/deallocate should make such a situation (memory manager thinks we have 
memory left and doesn't evict, but actually we are fully allocated, also with 
plenty to evict) close to impossible; even if processing is interrupted somehow 
and we lose a buffer, it should be consistently wrong with no mismatch between 
manager and allocator. It would be interesting to look at errors/interrupts to 
see what could have been wrong, in case there's another issue beside what this 
patch fixes.

> LLAP: metadata cache may incorrectly decrease memory usage in mem manager
> -------------------------------------------------------------------------
>
>                 Key: HIVE-16278
>                 URL: https://issues.apache.org/jira/browse/HIVE-16278
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-16278.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-16278) LLAP: metadata cache may incorrectly decrease memory usage in mem manager

Reply via email to