[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963813#comment-16963813
 ] 

Shashikant Banerjee edited comment on HDDS-2372 at 10/31/19 9:52 AM:
---------------------------------------------------------------------

Thanks [~elek] . I do agree that, there is no synchronisation between 
readStateMachineData and applyTransaction which may lead to NoSuchFile 
exception as you suggested but the appendRequest will be retried in the leader 
and the system should recover thereafter once the commit of writeChunk 
completes.

In teragen testing as well, i ran into same issue but my test did complete. Can 
you share the logs for this?


was (Author: shashikant):
Thanks [~elek] . I do agree that, there is no synchronisation between 
readStateMachineData and applyTransaction which may lead to NoSuchFile 
exception as you suggested but the appendRequest will be retried in the leader 
and the system should recover thereafter once the commit of writeChunk 
completes.

In teragen testing as well, i ran into same issue but my test did complete. Can 
you share the logs/test to reproduce this?

> Datanode pipeline is failing with NoSuchFileException
> -----------------------------------------------------
>
>                 Key: HDDS-2372
>                 URL: https://issues.apache.org/jira/browse/HDDS-2372
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Marton Elek
>            Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to