[ 
https://issues.apache.org/jira/browse/RATIS-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Gui updated RATIS-1375:
----------------------------
    Description: 
When testing ozone with bad ratis volume, we hit the following log:

``` 

{{2021-05-06 18:19:48,166 [Command processor thread] ERROR 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler:
 Can't create pipeline RATIS THREE 
PipelineID=08de41a6-5c9e-48d4-9789-4c09798ecffd
 java.io.IOException: Input/output error
 at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.addGroup(XceiverServerRatis.java:805)
 at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler.handle(CreatePipelineCommandHandler.java:92)
 at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
 at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$2(DatanodeStateMachine.java:506)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: java.io.IOException: Input/output error
 at java.io.UnixFileSystem.canonicalize0(Native Method)
 at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
 at java.io.File.getCanonicalPath(File.java:620)
 at 
org.apache.ratis.server.storage.RaftStorageDirectoryImpl.analyzeStorage(RaftStorageDirectoryImpl.java:129)
 at 
org.apache.ratis.server.storage.RaftStorageImpl.analyzeAndRecoverStorage(RaftStorageImpl.java:95)
 at 
org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:65)
 at 
org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:51)
 at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:112)
 at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:193)
 at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 ... 1 more}}

```

RaftServer does not catch the IOException and just throw it.

Actually when we have multiple storageDirs, we could try other dirs.

  was:
When testing ozone with bad ratis volume, we hit the following log:

```

 

{{2021-05-06 18:19:48,166 [Command processor thread] ERROR 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler:
 Can't create pipeline RATIS THREE 
PipelineID=08de41a6-5c9e-48d4-9789-4c09798ecffd
java.io.IOException: Input/output error
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.addGroup(XceiverServerRatis.java:805)
        at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler.handle(CreatePipelineCommandHandler.java:92)
        at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
        at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$2(DatanodeStateMachine.java:506)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Input/output error
        at java.io.UnixFileSystem.canonicalize0(Native Method)
        at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
        at java.io.File.getCanonicalPath(File.java:620)
        at 
org.apache.ratis.server.storage.RaftStorageDirectoryImpl.analyzeStorage(RaftStorageDirectoryImpl.java:129)
        at 
org.apache.ratis.server.storage.RaftStorageImpl.analyzeAndRecoverStorage(RaftStorageImpl.java:95)
        at 
org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:65)
        at 
org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:51)
        at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:112)
        at 
org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:193)
        at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more}}

```

RaftServer does not catch the IOException and just throw it.

Actually when we have multiple storageDirs, we could try other dirs.


> Handle bad storage dir due to disk failures
> -------------------------------------------
>
>                 Key: RATIS-1375
>                 URL: https://issues.apache.org/jira/browse/RATIS-1375
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: Mark Gui
>            Priority: Major
>
> When testing ozone with bad ratis volume, we hit the following log:
> ``` 
> {{2021-05-06 18:19:48,166 [Command processor thread] ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler:
>  Can't create pipeline RATIS THREE 
> PipelineID=08de41a6-5c9e-48d4-9789-4c09798ecffd
>  java.io.IOException: Input/output error
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.addGroup(XceiverServerRatis.java:805)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler.handle(CreatePipelineCommandHandler.java:92)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$2(DatanodeStateMachine.java:506)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.io.IOException: Input/output error
>  at java.io.UnixFileSystem.canonicalize0(Native Method)
>  at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
>  at java.io.File.getCanonicalPath(File.java:620)
>  at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.analyzeStorage(RaftStorageDirectoryImpl.java:129)
>  at 
> org.apache.ratis.server.storage.RaftStorageImpl.analyzeAndRecoverStorage(RaftStorageImpl.java:95)
>  at 
> org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:65)
>  at 
> org.apache.ratis.server.storage.RaftStorageImpl.<init>(RaftStorageImpl.java:51)
>  at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:112)
>  at 
> org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:193)
>  at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266)
>  at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ... 1 more}}
> ```
> RaftServer does not catch the IOException and just throw it.
> Actually when we have multiple storageDirs, we could try other dirs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to