[ 
https://issues.apache.org/jira/browse/HBASE-22628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872931#comment-16872931
 ] 

Anoop Sam John commented on HBASE-22628:
----------------------------------------

I too feel like this case should not be considered as a bug. Clean RS down is 
required for changing the WAL dir for that RS.  When the RS was abrupt killed, 
during the restart changing the WAL dir is some thing very error prone and dont 
think some one would do that.. Or any clear such need is there [~pankaj2461]?  
Would be better to document it clearly only and not do any handling in code 
unless very strong reason to do so.

> Data loss while migrating to custom WAL directory (hbase.wal.dir)
> -----------------------------------------------------------------
>
>                 Key: HBASE-22628
>                 URL: https://issues.apache.org/jira/browse/HBASE-22628
>             Project: HBase
>          Issue Type: Bug
>          Components: Recovery, wal
>            Reporter: Pankaj Kumar
>            Assignee: Pankaj Kumar
>            Priority: Blocker
>
> There is one data loss scenario while migrating to custom WAL directory.
> Steps to reproduce:
>  # Setup HBase cluster with the default setting (all WAL files are under the 
> root directory ie. /hbase/WALs).
>  # Create table 't1' and insert few records
>  # Flush meta table (so that table region entries persist in FS)
>  # Forcibly kill HBase processes (HM & RS).
>  # Configure the hbase.wal.dir to outside the root dir (say /hbaseWAL)
>  # Start the HBase servers
>  # Scan 't1'
> Ideally HMaster should submit split task of old RS(s) WAL files (created 
> under /hbase/WALs) and old data should be replayed.
> But currently, during HM startup we populate the previous dead servers from 
> the current WAL dir ( hbase.wal.dir -> /hbaseWAL).
> In MasterFileSystem.getFailedServersFromLogFolders(),
> {code:java}
> Set<ServerName> getFailedServersFromLogFolders() {
>  boolean retrySplitting = !conf.getBoolean("hbase.hlog.split.skip.errors",
>  WALSplitter.SPLIT_SKIP_ERRORS_DEFAULT);
> Set<ServerName> serverNames = new HashSet<ServerName>();
>  Path logsDirPath = new Path(this.walRootDir, HConstants.HREGION_LOGDIR_NAME);
> do {
>  if (master.isStopped()) {
>  LOG.warn("Master stopped while trying to get failed servers.");
>  break;
>  }
>  try {
>  if (!this.walFs.exists(logsDirPath)) return serverNames;
>  FileStatus[] logFolders = FSUtils.listStatus(this.walFs, logsDirPath, null);
> {code}
> For backward compatibility we should consider default WAL directory path also.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to