[
https://issues.apache.org/jira/browse/HBASE-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034139#comment-13034139
]
Prakash Khemani commented on HBASE-3889:
----------------------------------------
Thanks Lars for finding and providing a fix for this issue. I have a few minor
comments on the patch ...
The following isn't really needed, the earlier check you put in should be good
enough.
{code}
+ if (wap == null) {
+ continue;
+ }
{code}
It might be better to catch Throwable in SplitLogWorker.run() and print the
Unexpected Error message there. It might not be a good thing to ignore an
unexpected exception in SplitLogWorker.grabTask() and continue.
{nofrmat}
+++ src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java
(working copy)
@@ -297,6 +297,8 @@
}
break;
}
+ } catch (Exception e) {
+ LOG.error("An error occurred.", e);
} finally {
if (t > 0) {
LOG.info("worker " + serverName + " done with task " + path +
{noformat}
> NPE in Distributed Log Splitting
> --------------------------------
>
> Key: HBASE-3889
> URL: https://issues.apache.org/jira/browse/HBASE-3889
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.92.0
> Environment: Pseudo-distributed on MacOS
> Reporter: Lars George
> Assignee: Lars George
> Fix For: 0.92.0
>
> Attachments: HBASE-3889.patch
>
>
> There is an issue with the log splitting under the specific condition of
> edits belonging to a non existing region (which went away after a split for
> example). The HLogSplitter fails to check the condition, which is handled on
> a lower level, logging manifests it as
> {noformat}
> 2011-05-16 13:56:10,300 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: This region's
> directory doesn't exist:
> hdfs://localhost:8020/hbase/usertable/30c4d0a47703214845d0676d0c7b36f0. It is
> very likely that it was already split so it's safe to discard those edits.
> {noformat}
> The code returns a null reference which is not check in
> HLogSplitter.splitLogFileToTemp():
> {code}
> ...
> WriterAndPath wap = (WriterAndPath)o;
> if (wap == null) {
> wap = createWAP(region, entry, rootDir, tmpname, fs, conf);
> if (wap == null) {
> logWriters.put(region, BAD_WRITER);
> } else {
> logWriters.put(region, wap);
> }
> }
> wap.w.append(entry);
> ...
> {code}
> The createWAP does return "null" when the above message is logged based on
> the obsolete region reference in the edit.
> What made this difficult to detect is that the error (and others) are
> silently ignored in SplitLogWorker.grabTask(). I added a catch and error
> logging to see the NPE that was caused by the above.
> {code}
> ...
> break;
> }
> } catch (Exception e) {
> LOG.error("An error occurred.", e);
> } finally {
> if (t > 0) {
> ...
> {code}
> As a side note, there are other errors/asserts triggered that this
> try/finally not handles. For example
> {noformat}
> 2011-05-16 13:58:30,647 WARN
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: BADVERSION failed to
> assert ownership for
> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for
> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.ownTask(SplitLogWorker.java:329)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.access$100(SplitLogWorker.java:68)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$2.progress(SplitLogWorker.java:265)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:432)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:354)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:260)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:191)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:680)
> {noformat}
> This should probably be handled - or at least documented - in another issue?
> The NPE made the log split end and the SplitLogManager add an endless amount
> of RESCAN entries as this never came to an end.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira