[
https://issues.apache.org/jira/browse/HIVE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-16278:
------------------------------------
Attachment: HIVE-16278.patch
[~gopalv] can you take a look?
metadata cache mistakengly calls reserve with no-wait flag (which is only there
for unit tests...), and then doesn't check the result. So, if the first
eviction fails, it will ignore it and put object in the map anyway; then if
there's a collision in the map, it will release the memory that has never been
reserved.
Unfortunately I'm not sure that's non-exotic enough to cause the issue in your
cluster. Full LLAP log would be helpful, to examine error callstacks. Elevator
threads are never interrupted, and only check stop between the calls to read
one split; in addition location of reserve/release calls in relation to
allocate/deallocate should make such a situation (memory manager thinks we have
memory left and doesn't evict, but actually we are fully allocated, also with
plenty to evict) close to impossible; even if processing is interrupted somehow
and we lose a buffer, it should be consistently wrong with no mismatch between
manager and allocator. It would be interesting to look at errors/interrupts to
see what could have been wrong, in case there's another issue beside what this
patch fixes.
> LLAP: metadata cache may incorrectly decrease memory usage in mem manager
> -------------------------------------------------------------------------
>
> Key: HIVE-16278
> URL: https://issues.apache.org/jira/browse/HIVE-16278
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-16278.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)