[
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962153#comment-16962153
]
Marton Elek commented on HDDS-2372:
-----------------------------------
Thanks the help [~szetszwo]
# I found it only at one datanode. But it's hard to reproduce, usually I need
to write a lot of write chunks
# Yes, the test writes chunks to one ratis pipeline without using any real
block id / container id. It's uploaded in HDDS-2327 (Use patch + ozone freon
dcg -n 100000)
# Yes, this is the logic in ChunkManagerImpl.readChunk but I can't see any
lock / sync between checking the files. Chunk can be committed in the middle of
the read / tests (IMHO)
{code:java}
if (containerData.getLayOutVersion() == ChunkLayOutVersion
.getLatestVersion().getVersion()) {
File chunkFile = ChunkUtils.getChunkFile(containerData, info);
// In case the chunk file does not exist but tmp chunk file exist,
// read from tmp chunk file if readFromTmpFile is set to true
if (!chunkFile.exists() && dispatcherContext != null
&& dispatcherContext.isReadFromTmpFile()) {
//WHAT IF CHUNK IS COMMITTED AT THIS POINT?
chunkFile = getTmpChunkFile(chunkFile, dispatcherContext);
}
data = ChunkUtils.readData(chunkFile, info, volumeIOStats); {code}
> Datanode pipeline is failing with NoSuchFileException
> -----------------------------------------------------
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Marton Elek
> Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException:
> java.util.concurrent.ExecutionException:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> java.nio.file.NoSuchFileException:
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
> {code}
> Can be reproduced.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]