[ 
https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679699#comment-16679699
 ] 

Mukul Kumar Singh commented on HDDS-806:
----------------------------------------

Looked into this by adding more detail.

Create container future is added into the queue
{code}
2018-11-08 11:54:29 INFO  ContainerStateMachine:355 - cmd type:CreateContainer 
log index:22
2018-11-08 11:54:29 INFO  ContainerStateMachine:338 - create container id:92
{code}

The write chunk request is enqueued after the create container future.
{code}
2018-11-08 11:54:31 INFO  ContainerStateMachine:319 - writeChunk 
writeStateMachineData : blockId containerID: 92
localID: 101035413819228251
 logIndex 44 chunkName 
bea707c170764fb53142165e4e235952_stream_e8281cb6-6cc0-47ca-89a2-0bd95c2829fd_chunk_1extcutor:java.util.concurrent.ThreadPoolExecutor@32c56310[Running,
 pool size = 60, active threads = 0, queued tasks = 0, completed tasks = 11]   
num pendig:2 cc duture:java.util.concurrent.CompletableFuture@460d18bb[Not 
completed, 1 dependents]
{code}

because the create container never finishes, the write chunk keeps timing out.
{code}
2018-11-08 11:57:41 WARN  RaftLogWorker:76 - Timeout 18/~
org.apache.ratis.protocol.TimeoutIOException: Timeout 10 s: WriteLog:44: (t:1, 
i:44), STATEMACHINELOGENTRY, client-CD400B4257FD, cid=12-writeStateMachineData
        at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
        at 
org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:73)
        at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:338)
        at 
org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:210)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
        ... 4 more
{code}

> writeStateMachineData times out because chunk executors are not scheduled
> -------------------------------------------------------------------------
>
>                 Key: HDDS-806
>                 URL: https://issues.apache.org/jira/browse/HDDS-806
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Nilotpal Nandi
>            Assignee: Tsz Wo Nicholas Sze
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: HDDS-806.001.patch, HDDS-806.002.patch, 
> HDDS-806_20181107.patch, all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to