[ 
https://issues.apache.org/jira/browse/HBASE-29857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18054891#comment-18054891
 ] 

rstest commented on HBASE-29857:
--------------------------------

I found the issue is partially solved by 
[HBASE-28839|https://issues.apache.org/jira/browse/HBASE-28839]:

 

  *HBase 2.6.3 Persistence Format (OLD):*
  // Writes numChunks first, then chunks
  byte[] bytes = new byte[Long.BYTES];
  long numChunks = Bytes.toLong(bytes, 0);  // Read numChunks

  // BUG: When numChunks=0 (empty cache), code still tries to read first chunk
  BucketCacheProtos.BucketCacheEntry firstChunk =
    BucketCacheProtos.BucketCacheEntry.parseDelimitedFrom(in);  // Returns null 
→ NPE

  *Master Persistence Format (NEW):*
  // In BucketProtoUtils.serializeAsPB():
  fos.write(PB_MAGIC_V2);                           // 1. Write magic bytes
  toPB(cache, builder).writeDelimitedTo(fos);       // 2. ALWAYS write metadata 
(even if empty)
  for (entry : cache.backingMap.entrySet()) \{ ... } // 3. Write chunks only if 
non-empty

  // In BucketCache.retrieveChunkedBackingMap():
  BucketCacheEntry cacheEntry = parseDelimitedFrom(in);  // Reads metadata 
(always present)
  while (in.available() > 0) \{ ... }                     // Gracefully handles 
no chunks

 

However it could be good if we do a NPE check for parseDelimitedFrom(in), 
currently NPE is caught by a generic exception handler introduced in 
[HBASE-28839|https://issues.apache.org/jira/browse/HBASE-28839].

 

  The Fix:
{code:java}
  private void retrieveChunkedBackingMap(FileInputStream in) throws IOException 
{
    BucketCacheProtos.BucketCacheEntry cacheEntry =
      BucketCacheProtos.BucketCacheEntry.parseDelimitedFrom(in);
    // HBASE-29857: Handle case where persistence file is empty or corrupted.
    // parseDelimitedFrom() returns null when there's no data to read.
    if (cacheEntry == null) {
      throw new IOException(
        "Failed to read cache entry from persistence file (possibly empty or 
corrupted)");
    }
{code}
 

[~wchevreuil] What do you think?

> BucketCache fails to start when persistence file was written with empty cache
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-29857
>                 URL: https://issues.apache.org/jira/browse/HBASE-29857
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.6.3
>            Reporter: rstest
>            Priority: Critical
>
> When a RegionServer with BucketCache persistence enabled is restarted, if the 
> BucketCache was empty at shutdown time, the new RegionServer fails to start 
> with a `NullPointerException` in `BucketCache.parsePB()`.
>  
> The bug is in the interaction between `BucketProtoUtils.serializeAsPB()` and 
> `BucketCache.retrieveChunkedBackingMap()`:
> 1. During shutdown with empty cache: When `backingMap.size() == 0`, 
> `serializeAsPB()` writes `numChunks = 0` to the persistence file, but the 
> loop that writes `BucketCacheEntry` objects never executes (because there are 
> no entries to iterate). This means no BucketCacheEntry is written to the file
> 2. During startup: `retrieveChunkedBackingMap()` reads `numChunks = 0` from 
> the file, but still attempts to read the first chunk using 
> `parseDelimitedFrom()`. Since no `BucketCacheEntry` was written, 
> `parseDelimitedFrom()` returns `null`.
> 3. NPE occurs: The null `firstChunk` is passed to `parsePB()`, which calls 
> `firstChunk.getDeserializersMap()` on the null object, causing NPE.
>  
> This bug just make the region server not able to be restarted.
> I will provide a fix in PR and also a unit test that can reproduce the bug 
> (if the fix is not applied).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to