[ 
https://issues.apache.org/jira/browse/HBASE-28971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong0829 updated HBASE-28971:
-----------------------------
    Description: 
For the FSHLog, when we try to roll the writer, we will
 # initiate the zigzagLatch, and wait for the safe point
 # After the safe point obtained, continue to close the writer

For above process, looks like we have 
[assumption|https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L388]
 that the highestSyncedTxid will must be bigger than highestUnsyncedTxid, I do 
not think it must be true, because for 
[attainSafePoint|https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L1119]
 we did not limit the save point much be sync workload, if its stop at append 
workload, the highestUnsyncedTxid will always be bigger highestSyncedTxid, 
right?

In our environment, we can reproduce the issue which will causing the wal log 
pilling up very quickly if a lot of writing, if we want to to make sure the the 
existing logic is working, we need to add a check and make attainSafePoint get 
correct safe point, or make sure the doReplaceWriter handle the WAL close 
correctly when highestUnsyncedTxid > highestSyncedTxid

 

  was:
For the FSHLog, when we try to roll the writer, we will
 # initiate the zigzagLatch, and wait for the safe point
 # After the safe point obtained, continue to close the writer

For above process, looks like we have 
[assumption|https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L388]
 that the highestSyncedTxid will must be bigger than highestUnsyncedTxid, I do 
not think it must be true, because for 
[attainSafePoint|https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L1119]
 we did not limit the save point much be sync workload, if its stop at append 
workload, the highestUnsyncedTxid will always be bigger highestSyncedTxid, 
right?

In our environment, we can reproduce the issue which will causing the wal log 
pilling up very quickly if a lot of writing, if we want to to make sure the the 
existing logic is working, we need to add a check and make attainSafePoint only 
on sync workload

 


> FSHLog can not roll the WAL log properly
> ----------------------------------------
>
>                 Key: HBASE-28971
>                 URL: https://issues.apache.org/jira/browse/HBASE-28971
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.4.18, 2.6.1, 2.5.10
>            Reporter: Dong0829
>            Assignee: Dong0829
>            Priority: Major
>
> For the FSHLog, when we try to roll the writer, we will
>  # initiate the zigzagLatch, and wait for the safe point
>  # After the safe point obtained, continue to close the writer
> For above process, looks like we have 
> [assumption|https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L388]
>  that the highestSyncedTxid will must be bigger than highestUnsyncedTxid, I 
> do not think it must be true, because for 
> [attainSafePoint|https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L1119]
>  we did not limit the save point much be sync workload, if its stop at append 
> workload, the highestUnsyncedTxid will always be bigger highestSyncedTxid, 
> right?
> In our environment, we can reproduce the issue which will causing the wal log 
> pilling up very quickly if a lot of writing, if we want to to make sure the 
> the existing logic is working, we need to add a check and make 
> attainSafePoint get correct safe point, or make sure the doReplaceWriter 
> handle the WAL close correctly when highestUnsyncedTxid > highestSyncedTxid
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to