Andrew Purtell created HBASE-12830:
--------------------------------------
Summary: Unreadable HLogs stuck in replication queues
Key: HBASE-12830
URL: https://issues.apache.org/jira/browse/HBASE-12830
Project: HBase
Issue Type: Bug
Affects Versions: 0.98.9
Reporter: Andrew Purtell
We had an incident where underlying infrastructure issues caused HDFS namenodes
to go down, not at the same time, leading to periods of HDFS service outage and
recovery as namenodes failed over. These clusters had replication enabled. We
had some Regionservers roll logs during partial HDFS availability. Namespace
entries for these HLogs were created but the first block-being-written was lost
or unable to complete, leading to dead 0-length HLogs in the queues of active
sources. When this happens the queue becomes stuck on the dead 0-length HLog
reporting EOFExceptions whenever the source wakes up and tries to (re)open the
current file like so:
{noformat}
2015-01-08 18:50:40,956 WARN
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 1-<hostname
removed>,60020,1418764167084 Got:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845)
at
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:70)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:175)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:184)
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:70)
at
org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:128)
at
org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:91)
at
org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:79)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:68)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:506)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:309)
{noformat}
This exception originates from where SequenceFile tries to read in the 4-byte
version header from position 0.
In ReplicationSource#run we have an active loop:
{code}
// Loop until we close down
while (isActive()) {
...
}
{code}
Within this loop we iterate over paths in the replication queue. For each path,
we attempt to open it:
{code}
// Open a reader on it
if (!openReader(sleepMultiplier)) {
// Reset the sleep multiplier, else it'd be reused for the next file
sleepMultiplier = 1;
continue;
}
{code}
When we have a zero length file openReader returns TRUE but this.reader is set
to NULL (look at the catch of the outer try block) and we fall through the
conditional to:
{code}
// If we got a null reader but didn't continue, then sleep and continue
if (this.reader == null) {
if (sleepForRetries("Unable to open a reader", sleepMultiplier)) {
sleepMultiplier++;
}
continue;
}
{code}
We will keep trying to open the current file for a long time. The queue will be
stuck until sleepMultiplier == maxRetriesMultiplier (conf
"replication.source.maxretriesmultiplier", default 10), with a base sleep time
of "replication.source.sleepforretries" (default 1000) ms, then we will call
ReplicationSource#processEndOfFile().
By default we will spin on opening the dead 0-length HLog for (1000*1) +
(1000*2) ... + (1000*10) milliseconds before processing the file out of the
queue. In HBASE-11964 we recommend increasing
"replication.source.maxretriesmultiplier" to 300. Using the updated
configuration we will wait for (1000*1) + (1000*2) ... + (1000*300)
milliseconds before processing the file out of the queue.
There should be some way to break out of this very long wait for a 0-length or
corrupt file that is blocking the queue. Perhaps we can check if there are
later files in the queue and not wait for maxRetriesMultipler in that case.
Some different threshold could be set for that. Or, now we can't short circuit
this manually by removing the 0-length log files, but we could enable this.
Currently a FNFE triggers an attempt to find where the log has gone. If it
can't be found, we fall through to
{code}
// TODO What happens the log is missing in both places?
{code}
and leave openReader() returning TRUE as above, the sleep multiplier is
increased, and we continue to spin on the file with increasing wait times.
(And having that TODO unimplemented is also not good.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)