ramkrish86 commented on a change in pull request #2501:
URL: https://github.com/apache/hbase/pull/2501#discussion_r501636048
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
##########
@@ -907,6 +959,9 @@ public void shutdown() throws IOException {
rollWriterLock.lock();
try {
doShutdown();
+ if (logArchiveExecutor != null) {
Review comment:
doShutDown is abstract method. So its better we do in the Parent class.
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
##########
@@ -715,11 +738,39 @@ private void cleanOldLogs() throws IOException {
regionsBlockingThisWal.clear();
}
}
+
if (logsToArchive != null) {
- for (Pair<Path, Long> logAndSize : logsToArchive) {
- this.totalLogSize.addAndGet(-logAndSize.getSecond());
- archiveLogFile(logAndSize.getFirst());
- this.walFile2Props.remove(logAndSize.getFirst());
+ final List<Pair<Path, Long>> localLogsToArchive = logsToArchive;
+ // make it async
+ for (Pair<Path, Long> log : localLogsToArchive) {
+ logArchiveExecutor.execute(() -> {
+ archiveRetriable(log);
+ });
+ this.walFile2Props.remove(log.getFirst());
+ }
+ }
+ }
+
+ protected void archiveRetriable(final Pair<Path, Long> log) {
+ int retry = 1;
+ while (true) {
+ try {
+ archiveLogFile(log.getFirst());
+ totalLogSize.addAndGet(-log.getSecond());
+ // successful
+ break;
+ } catch (Throwable e) {
+ if (retry > archiveRetries) {
+ LOG.error("Failed log archiving for the log {},", log.getFirst(), e);
+ if (this.server != null) {
+ this.server.abort("Failed log archiving", e);
Review comment:
Fixed this every where.
##########
File path:
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFailedAppendAndSync.java
##########
@@ -122,6 +127,18 @@ public DodgyFSLog(FileSystem fs, Path root, String logDir,
Configuration conf)
return regions;
}
+ @Override
+ protected void archiveLogFile(Path p) throws IOException {
+ if (throwArchiveException) {
+ throw new IOException("throw archival exception");
+ }
+ }
+
+ @Override
+ protected void archiveRetriable(Pair<Path, Long> localLogsToArchive) {
Review comment:
Done
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
##########
@@ -329,6 +336,11 @@ public WalProps(Map<byte[], Long>
encodedName2HighestSequenceId, long logSize) {
protected final AtomicBoolean rollRequested = new AtomicBoolean(false);
+ private final ExecutorService logArchiveExecutor =
Executors.newSingleThreadExecutor(
+ new
ThreadFactoryBuilder().setDaemon(true).setNameFormat("Log-Archiver-%d").build());
Review comment:
Done
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
##########
@@ -482,6 +505,8 @@ protected SyncFuture initialValue() {
this.walTooOldNs = TimeUnit.SECONDS.toNanos(conf.getInt(
SURVIVED_TOO_LONG_SEC_KEY, SURVIVED_TOO_LONG_SEC_DEFAULT));
this.useHsync = conf.getBoolean(HRegion.WAL_HSYNC_CONF_KEY,
HRegion.DEFAULT_WAL_HSYNC);
+ archiveRetries =
this.conf.getInt("hbase.regionserver.walroll.archive.retries", 0);
Review comment:
Ideally if the WAL FS is having an issue then this archive may also
fail. By default we will try it (mandatory) once (so retries are 0) - then if
this new config is set to a non-zero value we will try to repeat it for the
configured times. Generally we are not going to configure it to a non-zero
value.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]