Make HLog more resilient to write pipeline failures
---------------------------------------------------
Key: HBASE-4222
URL: https://issues.apache.org/jira/browse/HBASE-4222
Project: HBase
Issue Type: Improvement
Components: wal
Reporter: Gary Helmling
Fix For: 0.92.0
The current implementation of HLog rolling to recover from transient errors in
the write pipeline seems to have two problems:
# When {{HLog.LogSyncer}} triggers an {{IOException}} during time-based sync
operations, it triggers a log rolling request in the corresponding catch block,
but only after escaping from the internal while loop. As a result, the
{{LogSyncer}} thread will exit and never be restarted from what I can tell,
even if the log rolling was successful.
# Log rolling requests triggered by an {{IOException}} in {{sync()}} or
{{append()}} never happen if no entries have yet been written to the log. This
means that write errors are not immediately recovered, which extends the
exposure to more errors occurring in the pipeline.
In addition, it seems like we should be able to better handle transient
problems, like a rolling restart of DataNodes while the HBase RegionServers are
running. Currently this will reliably cause RegionServer aborts during log
rolling: either an append or time-based sync triggers an initial
{{IOException}}, initiating a log rolling request. However the log rolling
then fails in closing the current writer ("All datanodes are bad"), causing a
RegionServer abort. In this case, it seems like we should at least allow you
an option to continue with the new writer and only abort on subsequent errors.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira