[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962153#comment-16962153
 ] 

Marton Elek commented on HDDS-2372:
-----------------------------------

Thanks the help [~szetszwo]
 # I found it only at one datanode. But it's hard to reproduce, usually I need 
to write a lot of write chunks
 # Yes, the test writes chunks to one ratis pipeline without using any real 
block id / container id. It's uploaded in HDDS-2327 (Use patch + ozone freon 
dcg -n 100000)
 # Yes, this is the logic in ChunkManagerImpl.readChunk but I can't see any 
lock / sync between checking the files. Chunk can be committed in the middle of 
the read / tests (IMHO)

{code:java}

if (containerData.getLayOutVersion() == ChunkLayOutVersion
    .getLatestVersion().getVersion()) {
  File chunkFile = ChunkUtils.getChunkFile(containerData, info);

  // In case the chunk file does not exist but tmp chunk file exist,
  // read from tmp chunk file if readFromTmpFile is set to true
  if (!chunkFile.exists() && dispatcherContext != null
      && dispatcherContext.isReadFromTmpFile()) {

     //WHAT IF CHUNK IS COMMITTED AT THIS POINT?

    chunkFile = getTmpChunkFile(chunkFile, dispatcherContext);
  }
  data = ChunkUtils.readData(chunkFile, info, volumeIOStats); {code}
 

> Datanode pipeline is failing with NoSuchFileException
> -----------------------------------------------------
>
>                 Key: HDDS-2372
>                 URL: https://issues.apache.org/jira/browse/HDDS-2372
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Marton Elek
>            Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to