[
https://issues.apache.org/jira/browse/HBASE-26849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-26849.
-------------------------------
Resolution: Won't Fix
All 1.x release lines are EOL.
Close. Feel free to reopen if it also affects other active branches.
> NPE caused by WAL Compression and Replication
> ---------------------------------------------
>
> Key: HBASE-26849
> URL: https://issues.apache.org/jira/browse/HBASE-26849
> Project: HBase
> Issue Type: Bug
> Components: Replication, wal
> Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.11
> Reporter: tianhang tang
> Assignee: tianhang tang
> Priority: Critical
> Attachments: image-2022-03-16-14-25-49-276.png,
> image-2022-03-16-14-30-15-247.png
>
>
> My cluster uses HBase 1.4.12, opened WAL compression and replication.
> I could found replication sizeOfLogQueue backlog, and after some debugs,
> found the NPE throwed by
> [https://github.com/apache/hbase/blob/branch-1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/util/LRUDictionary.java#L109:]
> !image-2022-03-16-14-25-49-276.png!
>
> The root cause for this problem is:
> WALEntryStream#checkAllBytesParsed:
> !image-2022-03-16-14-30-15-247.png!
> resetReader does not create a new reader, the original CompressionContext and
> the dict in it will still be retained.
> However, at this time, the position is reset to 0, which means that the HLog
> needs to be read from the beginning, but the cache that has not been cleared
> is still used, so there will be problems: the same data has already in the
> LRUCache, and it will be directly added to the cache again.
> Recreate a new reader here, the problem is solved.
> I will open a PR later. But, there are some other places in the current code
> to resetReader or seekOnFs. I guess these codes doesn't take into account the
> wal compression case at all...
>
> In theory, as long as the file is read again, the LRUCache should also be
> rolled back, otherwise there will be inconsistent behavior of READ and WRITE
> links.
> But the position can be roll back to any intermediate position at will, but
> LRUCache can't...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)