[ https://issues.apache.org/jira/browse/HBASE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rushabh Shah reassigned HBASE-26435: ------------------------------------ Assignee: Rushabh Shah > [branch-1] The log rolling request maybe canceled immediately in LogRoller > due to a race > ----------------------------------------------------------------------------------------- > > Key: HBASE-26435 > URL: https://issues.apache.org/jira/browse/HBASE-26435 > Project: HBase > Issue Type: Sub-task > Components: wal > Affects Versions: 1.6.0 > Reporter: Rushabh Shah > Assignee: Rushabh Shah > Priority: Major > Fix For: 1.7.2 > > > Saw this issue in our internal 1.6 branch. > The WALÂ was rolled but the new WAL file was not writable and it logged the > following error also > {noformat} > 2021-11-03 19:20:19,503 WARN [.168:60020.logRoller] hdfs.DFSClient - Error > while syncing > java.io.IOException: Could not get block locations. Source file > "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" > - Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670) > 2021-11-03 19:20:19,507 WARN [.168:60020.logRoller] wal.FSHLog - pre-sync > failed but an optimization so keep going > java.io.IOException: Could not get block locations. Source file > "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" > - Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670) > {noformat} > Since the new WAL file was not writable, appends to that file started failing > immediately it was rolled. > {noformat} > 2021-11-03 19:20:19,677 INFO [.168:60020.logRoller] wal.FSHLog - Rolled WAL > /hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635965392022 > with entries=253234, filesize=425.67 MB; new WAL > /hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389 > 2021-11-03 19:20:19,690 WARN [020.append-pool17-t1] wal.FSHLog - Append > sequenceId=1962661783, requesting roll of WAL > java.io.IOException: Could not get block locations. Source file > "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" > - Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670) > 2021-11-03 19:20:19,690 INFO [.168:60020.logRoller] wal.FSHLog - Archiving > hdfs://prod-EMPTY-hbase2a/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635960792837 > to > hdfs://prod-EMPTY-hbase2a/hbase/oldWALs/hbase2a-dnds1-232-ukb.ops.sfdc.net%2C60020%2C1635567166484.1635960792837 > {noformat} > We always reset the rollLog flag within LogRoller thread after the rollWal > call is complete. > Within FSHLog#rollWriter method, it does many things, like replacing the > writer and archiving old logs. If append thread fails to write to new file > while logRoller thread is cleaning old logs, we will miss the rollLog flag > since LogRoller will reset the flag to false while the previous rollWriter > call is going on. > Relevant code: > https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java#L183-L203 > We need to reset rollLog flag before we start rolling the wal. > This is fixed in branch-2 and master via HBASE-22684 but we didn't fix it in > branch-1 > Also branch-2 has multi wal implementation so it can apply cleanly in > branch-1. -- This message was sent by Atlassian Jira (v8.20.1#820001)