[ 
https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17411:
------------------------------------
    Description: 
In a large stream whose buffers are not reused, separated into many buffers 
(e.g. due to a small ORC compression buffer size), it may happen that some, but 
not all, buffers that are read together as a unit are evicted from cache.
If CacheBuffer follows BufferChunk in the buffer list, the latter will be 
converted to ProcCacheChunk;  it is possible for early refcount release logic 
from the former to release the refcount (for a dictionary it would always be 
released cause by definition there's no reuse), and then backtrack to the 
latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk 
because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are 
released separately after the data is uncompressed.

I'm assuming it would almost never happen with non-stripe-level streams because 
one would need both very large RG to span 2+ CBs, no overlap with next/previous 
RGs in 2+ buffers for the early release to kick in, and an unfortunate eviction 
order. However it's possible with large-ish dictionaries.

  was:
In a large stream whose buffers are not reused (e.g. a dictionary, that is 
locked once for all RGs), separated into many buffers (e.g. due to a small ORC 
compression buffer size), it may happen that some, but not all, buffers are 
evicted from cache.
If CacheBuffer follows BufferChunk in the buffer list, the latter will be 
converted to ProcCacheChunk;  it is possible for early refcount release logic 
from the former to release the refcount (for a dictionary it would always be 
released cause by definition there's no reuse), and then backtrack to the 
latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk 
because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are 
released separately after the data is uncompressed.


> LLAP IO may incorrectly release a refcount in some rare cases
> -------------------------------------------------------------
>
>                 Key: HIVE-17411
>                 URL: https://issues.apache.org/jira/browse/HIVE-17411
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> In a large stream whose buffers are not reused, separated into many buffers 
> (e.g. due to a small ORC compression buffer size), it may happen that some, 
> but not all, buffers that are read together as a unit are evicted from cache.
> If CacheBuffer follows BufferChunk in the buffer list, the latter will be 
> converted to ProcCacheChunk;  it is possible for early refcount release logic 
> from the former to release the refcount (for a dictionary it would always be 
> released cause by definition there's no reuse), and then backtrack to the 
> latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk 
> because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are 
> released separately after the data is uncompressed.
> I'm assuming it would almost never happen with non-stripe-level streams 
> because one would need both very large RG to span 2+ CBs, no overlap with 
> next/previous RGs in 2+ buffers for the early release to kick in, and an 
> unfortunate eviction order. However it's possible with large-ish dictionaries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to