[ 
https://issues.apache.org/jira/browse/HBASE-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049398#comment-13049398
 ] 

stack commented on HBASE-3889:
------------------------------

@Anirudh Test looks good.  Do you want to include in your patch Lars fix or you 
want me to apply that separate?  Do you want to redo it addressing issues 
raised by Prakash (while considering Lars' follow up?)

> NPE in Distributed Log Splitting
> --------------------------------
>
>                 Key: HBASE-3889
>                 URL: https://issues.apache.org/jira/browse/HBASE-3889
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.92.0
>         Environment: Pseudo-distributed on MacOS
>            Reporter: Lars George
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3889.patch, patch.txt
>
>
> There is an issue with the log splitting under the specific condition of 
> edits belonging to a non existing region (which went away after a split for 
> example). The HLogSplitter fails to check the condition, which is handled on 
> a lower level, logging manifests it as 
> {noformat}
> 2011-05-16 13:56:10,300 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: This region's 
> directory doesn't exist: 
> hdfs://localhost:8020/hbase/usertable/30c4d0a47703214845d0676d0c7b36f0. It is 
> very likely that it was already split so it's safe to discard those edits.
> {noformat}
> The code returns a null reference which is not check in 
> HLogSplitter.splitLogFileToTemp():
> {code}
> ...
>         WriterAndPath wap = (WriterAndPath)o;
>         if (wap == null) {
>           wap = createWAP(region, entry, rootDir, tmpname, fs, conf);
>           if (wap == null) {
>             logWriters.put(region, BAD_WRITER);
>           } else {
>             logWriters.put(region, wap);
>           }
>         }
>         wap.w.append(entry);
> ...
> {code}
> The createWAP does return "null" when the above message is logged based on 
> the obsolete region reference in the edit.
> What made this difficult to detect is that the error (and others) are 
> silently ignored in SplitLogWorker.grabTask(). I added a catch and error 
> logging to see the NPE that was caused by the above.
> {code}
> ...
>           break;
>       }
>     } catch (Exception e) {
>       LOG.error("An error occurred.", e);
>     } finally {
>       if (t > 0) {
> ...
> {code}
> As a side note, there are other errors/asserts triggered that this 
> try/finally not handles. For example
> {noformat}
> 2011-05-16 13:58:30,647 WARN 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: BADVERSION failed to 
> assert ownership for 
> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for 
> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.ownTask(SplitLogWorker.java:329)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.access$100(SplitLogWorker.java:68)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$2.progress(SplitLogWorker.java:265)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:432)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:354)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:260)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:191)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
>         at java.lang.Thread.run(Thread.java:680)
> {noformat}
> This should probably be handled - or at least documented - in another issue?
> The NPE made the log split end and the SplitLogManager add an endless amount 
> of RESCAN entries as this never came to an end.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to