[jira] [Created] (HBASE-26849) NPE caused by WAL Compression and Replication

tianhang tang (Jira) Wed, 16 Mar 2022 00:41:04 -0700

tianhang tang created HBASE-26849:
-------------------------------------

             Summary: NPE caused by WAL Compression and Replication
                 Key: HBASE-26849
                 URL: https://issues.apache.org/jira/browse/HBASE-26849
             Project: HBase
          Issue Type: Bug
          Components: Replication, wal
    Affects Versions: 2.4.11, 1.7.1
            Reporter: tianhang tang
            Assignee: tianhang tang
         Attachments: image-2022-03-16-14-25-49-276.png, 
image-2022-03-16-14-30-15-247.png


My cluster uses HBase 1.4.12, opened WAL compression and replication.

I could found replication sizeOfLogQueue backlog, and after some debugs, found 
that NPE:

!image-2022-03-16-14-25-49-276.png!

 

The root cause for this problem is:
WALEntryStream#checkAllBytesParsed:

!image-2022-03-16-14-30-15-247.png!

resetReader does not create a new reader, the original CompressionContext and 
the dict in it will still be retained.
However, at this time, the position is reset to 0, which means that the HLog 
needs to be read from the beginning, but the cache that has not been cleared is 
still used, so there will be problems.
Recreate a new reader here, the problem is solved.

I will open a PR later. But, there are some other places in the current code to 
resetReader or seekOnFs. I guess these codes doesn't take into account the wal 
compression case at all...

 

In theory, as long as the file is read again, the LRUCache should also be 
rolled back, otherwise there will be inconsistent behavior of READ and WRITE 
links.
But the position can be roll back to any intermediate position at will, but 
LRUCache can't...



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HBASE-26849) NPE caused by WAL Compression and Replication

Reply via email to