[
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294049#comment-16294049
]
Chia-Ping Tsai commented on HBASE-18309:
----------------------------------------
I observer the NEP in log.
{code}
2017-12-17 08:53:01,584 INFO [6ff31ba4b7ce,35583,1513500588019_Chore_1]
hbase.ScheduledChore(181): Chore: ReplicationMetaCleaner was stopped
Exception in thread "OldWALsCleaner-1" Exception in thread "OldWALsCleaner-0"
java.lang.NullPointerException
at
org.apache.hadoop.hbase.master.cleaner.LogCleaner.deleteFile(LogCleaner.java:166)
at
org.apache.hadoop.hbase.master.cleaner.LogCleaner.lambda$createOldWalsCleaner$0(LogCleaner.java:127)
at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
at
org.apache.hadoop.hbase.master.cleaner.LogCleaner.deleteFile(LogCleaner.java:166)
at
org.apache.hadoop.hbase.master.cleaner.LogCleaner.lambda$createOldWalsCleaner$0(LogCleaner.java:127)
at java.lang.Thread.run(Thread.java:748)
{code}
If the thread is interrupted, the context may be null.
{code}
while (true) {
CleanerContext context = null;
boolean succeed = false;
boolean interrupted = false;
try {
context = pendingDelete.take();
if (context != null) {
FileStatus toClean = context.getTargetToClean();
succeed = this.fs.delete(toClean.getPath(), false);
}
} catch (InterruptedException ite) {
// It's most likely from configuration changing request
if (context != null) {
LOG.warn("Interrupted while cleaning oldWALs " +
context.getTargetToClean() + ", try to clean it next round.");
}
interrupted = true;
} catch (IOException e) {
// fs.delete() fails.
LOG.warn("Failed to clean oldwals with exception: " + e);
succeed = false;
} finally {
context.setResult(succeed); // here
if (interrupted) {
// Restore interrupt status
Thread.currentThread().interrupt();
break;
}
}
}
{code}
> Support multi threads in CleanerChore
> -------------------------------------
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
> Issue Type: Improvement
> Reporter: binlijin
> Assignee: Reid Chan
> Fix For: 3.0.0, 2.0.0-beta-1
>
> Attachments: HBASE-18309.master.001.patch,
> HBASE-18309.master.002.patch, HBASE-18309.master.004.patch,
> HBASE-18309.master.005.patch, HBASE-18309.master.006.patch,
> HBASE-18309.master.007.patch, HBASE-18309.master.008.patch,
> HBASE-18309.master.009.patch, HBASE-18309.master.010.patch,
> HBASE-18309.master.011.patch, HBASE-18309.master.012.patch,
> space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big
> cluster we find this is not enough. The number of files under oldWALs reach
> the max-directory-items limit of HDFS and cause region server crash, so we
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive
> directory, and we could use multiple threads cleaning sub directories in
> parallel to speed it up.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)