[ 
https://issues.apache.org/jira/browse/RATIS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299706#comment-17299706
 ] 

Janus Chow edited comment on RATIS-1332 at 3/11/21, 4:38 PM:
-------------------------------------------------------------

[~elek] Can I handle this ticket? my proposal would be as follows:
{code:java}
} else if (storageState == StorageState.NOT_FORMATTED) {
      if (!storageDir.isCurrentEmpty()) {
        cleanMetaTmpFile();
      }
      format();
      return StorageState.NORMAL;
} {code}


was (Author: symious):
[~elek] Can I handle this ticket? my proposal would be as follows:
{code:java}
} else if (storageState == StorageState.NOT_FORMATTED) {
      if (!storageDir.isCurrentEmpty()) {
        cleanMetaTmpFile();
      }
      format();
      return StorageState.NORMAL;

// code placeholder
{code}
 

> Ratis server couln't be recovered from failed initialization state
> ------------------------------------------------------------------
>
>                 Key: RATIS-1332
>                 URL: https://issues.apache.org/jira/browse/RATIS-1332
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Marton Elek
>            Priority: Blocker
>
> I found this problem during the test of ratis 2.0.0-rc3 and earlier.
> I noticed that in some cases the Ozone Manager (with ratis enabled true) 
> couldn't be started any more (see HDDS-4703 for details).
> After some investigation I found the following problem:
>  1. Ratis server initialized BEFORE om RPC (OzoneManager.startRpcServer)
>  2. If the RPC server is failed (due to missing DNS for example) the Ratis 
> server is stopped during the initialization
>  3. AtomicOutputStream can leave some tmp files behind (like raft-meta.tmp, 
> if it's not yet renamed)
>  4. After DNS problem is fixed the OM couldn't be started anymore as 
> RaftStorageImpl.analyzeAndRecoverStorage requires FORMATTED or empty (!!!) 
> directory. Directory with leftover tmp file is not empty.
> {code}
>   private StorageState analyzeAndRecoverStorage(boolean toLock) throws 
> IOException {
>     StorageState storageState = storageDir.analyzeStorage(toLock);
>     if (storageState == StorageState.NORMAL) {
>         // ...
>     } else if (storageState == StorageState.NOT_FORMATTED &&
>         storageDir.isCurrentEmpty()) {
>      //never called this if one .tmp file exists from the previous attempts
>       format();
>       return StorageState.NORMAL;
>     } else {
>       return storageState;
>     }
>   }
> {code}
> The problem is that `cleanMetaTmpFile();` is called only in the first branch, 
> but before checking if the directory is empty or not...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to