[ https://issues.apache.org/jira/browse/HBASE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028548#comment-13028548 ]
Jieshan Bean commented on HBASE-3820: ------------------------------------- Sorry for some days delay of this issue. I will post a new patch today, and I hope I can finish it well. thanks. > Splitlog() executed while the namenode was in safemode may cause data-loss > -------------------------------------------------------------------------- > > Key: HBASE-3820 > URL: https://issues.apache.org/jira/browse/HBASE-3820 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.2 > Reporter: Jieshan Bean > Fix For: 0.90.3 > > Attachments: HBASE-3820-MFSFix-90.patch > > > I found this problem while the namenode went into safemode due to some > unclear reasons. > There's one patch about this problem: > try { > HLogSplitter splitter = HLogSplitter.createLogSplitter( > conf, rootdir, logDir, oldLogDir, this.fs); > try { > splitter.splitLog(); > } catch (OrphanHLogAfterSplitException e) { > LOG.warn("Retrying splitting because of:", e); > // An HLogSplitter instance can only be used once. Get new instance. > splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir, > oldLogDir, this.fs); > splitter.splitLog(); > } > splitTime = splitter.getTime(); > splitLogSize = splitter.getSize(); > } catch (IOException e) { > checkFileSystem(); > LOG.error("Failed splitting " + logDir.toString(), e); > master.abort("Shutting down HBase cluster: Failed splitting hlog > files...", e); > } finally { > this.splitLogLock.unlock(); > } > And it was really give some useful help to some extent, while the namenode > process exited or been killed, but not considered the Namenode safemode > exception. > I think the root reason is the method of checkFileSystem(). > It gives out an method to check whether the HDFS works normally(Read and > write could be success), and that maybe the original propose of this method. > This's how this method implements: > DistributedFileSystem dfs = (DistributedFileSystem) fs; > try { > if (dfs.exists(new Path("/"))) { > return; > } > } catch (IOException e) { > exception = RemoteExceptionHandler.checkIOException(e); > } > > I have check the hdfs code, and learned that while the namenode was in > safemode ,the dfs.exists(new Path("/")) returned true. Because the file > system could provide read-only service. So this method just checks the dfs > whether could be read. I think it's not reasonable. > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira