[jira] [Commented] (HBASE-26849) NPE caused by WAL Compression and Replication

tianhang tang (Jira) Wed, 04 Jan 2023 19:52:11 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-26849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654721#comment-17654721
 ]


tianhang tang commented on HBASE-26849:
---------------------------------------

[~bbeaudreault] ??at the surface it seems like this affects active branches??
Yes sir, it affects all branches. As long as you use replication, you need to 
turn off WAL Compression, no matter what version you are based on.

> NPE caused by WAL Compression and Replication
> ---------------------------------------------
>
>                 Key: HBASE-26849
>                 URL: https://issues.apache.org/jira/browse/HBASE-26849
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, wal
>    Affects Versions: 1.7.1, 3.0.0-alpha-2, 2.4.11
>            Reporter: tianhang tang
>            Assignee: tianhang tang
>            Priority: Critical
>         Attachments: image-2022-03-16-14-25-49-276.png, 
> image-2022-03-16-14-30-15-247.png
>
>
> My cluster uses HBase 1.4.12, opened WAL compression and replication.
> I could found replication sizeOfLogQueue backlog, and after some debugs, 
> found the NPE throwed by 
> [https://github.com/apache/hbase/blob/branch-1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/util/LRUDictionary.java#L109:]
> !image-2022-03-16-14-25-49-276.png!
>  
> The root cause for this problem is:
> WALEntryStream#checkAllBytesParsed:
> !image-2022-03-16-14-30-15-247.png!
> resetReader does not create a new reader, the original CompressionContext and 
> the dict in it will still be retained.
> However, at this time, the position is reset to 0, which means that the HLog 
> needs to be read from the beginning, but the cache that has not been cleared 
> is still used, so there will be problems: the same data has already in the 
> LRUCache, and it will be directly added to the cache again.
> Recreate a new reader here, the problem is solved.
> I will open a PR later. But, there are some other places in the current code 
> to resetReader or seekOnFs. I guess these codes doesn't take into account the 
> wal compression case at all...
>  
> In theory, as long as the file is read again, the LRUCache should also be 
> rolled back, otherwise there will be inconsistent behavior of READ and WRITE 
> links.
> But the position can be roll back to any intermediate position at will, but 
> LRUCache can't...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-26849) NPE caused by WAL Compression and Replication

Reply via email to