[ 
https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529732#comment-16529732
 ] 

Zheng Hu edited comment on HBASE-20789 at 7/2/18 11:26 AM:
-----------------------------------------------------------

As [~Apache9] comment on RB,  there's problem here  in patch.v3:

{code}
443         if (replaceExistingCacheBlock) {
444           ramCache.put(cacheKey, re);
445         } else if (ramCache.putIfAbsent(cacheKey, re) != null) {
446           return;
447         }
{code}

Can not just replace the cacheKey with new RAMQueueEntry, because  the heapSize 
of bucket cache need to update if removing entry from ramCache.  the 
WriterThread  write to io-engine firstly, then sync, then remove the 
RAMQueueEntry from ramCache.  It's possible that the removed entry is not the 
right one. 

{code}
t1.   thread0 try to cache block0 with key0 (BucketCache#cacheBlock)
t2.   replace it into ramCache; 
t3.   writer thread write to io-engine;
 // t4.    another thread1 try to cache block1 with same key0; 
(BucketCache#cacheBlock)
 // t5.    replace block0 with block1  in ramCache 
t5.   remove the entry (block1) with key0 from ramCache; 
{code}

Finally,the thread0 will remove the incorrect block1... the heap size is wrong 
also.. 

So for safety, we still keep the putIfAbsent() to ensure that only one thread 
will remove entry from ramCache...  the flaky ut has been fixed by waiting 
until the cache flushed to io-engine...  


was (Author: openinx):
As [~Apache9] comment on RB,  there's problem here  in patch.v3:

{code}
443         if (replaceExistingCacheBlock) {
444           ramCache.put(cacheKey, re);
445         } else if (ramCache.putIfAbsent(cacheKey, re) != null) {
446           return;
447         }
{code}

Can not just replace the cacheKey with new RAMQueueEntry, because  the heapSize 
of bucket cache need to update if removing entry from ramCache.  the 
WriterThread  write to io-engine firstly, then sync, then remove the 
RAMQueueEntry from ramCache.  It's possible that the removed entry is not the 
right one. 

{code}
t1.   thread0 try to cache block0 with key0 (BucketCache#cacheBlock)
t2.   replace it into ramCache; 
t3.   writer thread write to io-engine;
                                                                                
                  // t4.    another thread1 try to cache block1 with same key0; 
(BucketCache#cacheBlock)
                                                                                
                  // t5.    replace block0 with block1  in ramCache 
t5.   remove the entry (block1) with key0 from ramCache; 
{code}

Finally,the thread0 will remove the incorrect block1... the heap size is wrong 
also.. 

So for safety, we still keep the putIfAbsent() to ensure that only one thread 
will remove entry from ramCache...  the flaky ut has been fixed by waiting 
until the cache flushed to io-engine...  

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---------------------------------------------------------------
>
>                 Key: HBASE-20789
>                 URL: https://issues.apache.org/jira/browse/HBASE-20789
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.6, 2.0.2
>
>         Attachments: 
> 0001-HBASE-20789-TestBucketCache-testCacheBlockNextBlockM.patch, 
> HBASE-20789.v1.patch, HBASE-20789.v2.patch, HBASE-20789.v3.patch, 
> bucket-33718.out
>
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to