[
https://issues.apache.org/jira/browse/HDDS-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Gui updated HDDS-6614:
---------------------------
Description:
We are stressing an Ozone cluster with heavy EC writes and reads, and we get
one and only one exception from one DN:
{code:java}
2022-04-14 15:02:04,422 [ChunkReader-46] INFO
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation:
WriteChunk , Trace ID: , Message: Chunk file offset 1048576 does not match
blockFile length 0 , Result: CHUNK_FILE_INCONSISTENCY ,
StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
Chunk file offset 1048576 does not match blockFile length 0
at
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.validateChunkSize(ChunkUtils.java:391)
at
org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.writeChunk(FilePerBlockStrategy.java:140)
at
org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.writeChunk(ChunkManagerDispatcher.java:73)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:705)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:224)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:188)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:307)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$14(HddsDispatcher.java:169)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
at
org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:57)
at
org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:50)
at
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:255)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
at
org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2022-04-14 15:02:04,441 [ChunkReader-46] WARN
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving container
/data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614
to state UNHEALTHY from state:UNHEALTHY
Trace:java.lang.Thread.getStackTrace(Thread.java:1559) {code}
And this results in an UNHEALTHY container replica:
{code:java}
[ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin container info
614
Container id: 614
Pipeline id: bddd5494-af98-46b2-a4c3-6757f129d3d2
Container State: CLOSED
Datanodes: [...]
Replicas: [State: CLOSED; ReplicaIndex: 7; Origin:
d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad; Location:
d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad/9.27.141.78,
State: CLOSED; ReplicaIndex: 10; Origin: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea;
Location: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea/9.27.142.97,
State: CLOSED; ReplicaIndex: 9; Origin: 65024339-4831-43c0-9fc0-752f8999d9c4;
Location: 65024339-4831-43c0-9fc0-752f8999d9c4/9.27.140.135,
State: CLOSED; ReplicaIndex: 8; Origin: e963db9e-44ea-425c-92e3-bdb5305c3049;
Location: e963db9e-44ea-425c-92e3-bdb5305c3049/9.27.142.51,
State: CLOSED; ReplicaIndex: 6; Origin: b6115b17-827c-484f-b61a-5f5849c5dbdb;
Location: b6115b17-827c-484f-b61a-5f5849c5dbdb/9.27.141.121,
State: UNHEALTHY; ReplicaIndex: 12; Origin:
ae31f944-b6c9-4e18-b677-0b0edb2a8e13; Location:
ae31f944-b6c9-4e18-b677-0b0edb2a8e13/9.27.141.78,
State: CLOSED; ReplicaIndex: 2; Origin: 4a6af5ae-097a-4daa-9390-b0fbe87446f1;
Location: 4a6af5ae-097a-4daa-9390-b0fbe87446f1/9.27.144.159,
State: CLOSED; ReplicaIndex: 3; Origin: 3c3bc79e-3b5f-4f73-b502-7778a7993db6;
Location: 3c3bc79e-3b5f-4f73-b502-7778a7993db6/9.27.140.38,
State: CLOSED; ReplicaIndex: 14; Origin: 37db964c-0663-4b7c-915c-eaffccc32f55;
Location: 37db964c-0663-4b7c-915c-eaffccc32f55/9.27.142.51,
State: CLOSED; ReplicaIndex: 11; Origin: dc262c7c-184e-4f66-af98-2ece21deeab1;
Location: dc262c7c-184e-4f66-af98-2ece21deeab1/9.27.140.222,
State: CLOSED; ReplicaIndex: 13; Origin: e7ea47c7-6d33-490a-9d3a-3c8231c81e30;
Location: e7ea47c7-6d33-490a-9d3a-3c8231c81e30/9.27.144.169,
State: CLOSED; ReplicaIndex: 1; Origin: 8333d554-2e39-4ef6-b73a-01539e82a9d1;
Location: 8333d554-2e39-4ef6-b73a-01539e82a9d1/9.27.144.226,
State: CLOSED; ReplicaIndex: 4; Origin: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae;
Location: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae/9.27.141.230,
State: CLOSED; ReplicaIndex: 5; Origin: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa;
Location: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa/9.27.140.135] {code}
>From the log, we could see that a block file received a write request at
>offset 1048576 while the block file is of length 0.
This should be considered insane because EC client write stripes one by one, it
doesn't send the next stripe chunks to DNs before we get success responses from
DNs for the previous stripe.
So I suspect that the problem lies in the check of the DN itself.
{code:java}
public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
throws StorageContainerException {
long offset = chunkInfo.getOffset();
long len = chunkFile.length();
if (chunkFile.length() != offset) {
throw new StorageContainerException(
"Chunk file offset " + offset + " does not match blockFile length " +
len, CHUNK_FILE_INCONSISTENCY);
}
} {code}
Here we get the length of the existing block file by using `chunkFile.length()`
which is suspicious since we do write to block files by using FileChannels, so
I think we should get the file size by the channel too because the channel may
haven't create the block file yet when we do `chunkFile.length()`, thus we may
get a 0. This may be the cause of this problem.
It is hard to reproduce this problem with simple tests, it only reveals on
heavy loads, but I did a small experiment to show the different between
`fileChannel.size()` vs `file.length()`, the small program could be seen in the
attachments.
> EC: Fix Datanode block file INCONSISTENCY during heavy load.
> ------------------------------------------------------------
>
> Key: HDDS-6614
> URL: https://issues.apache.org/jira/browse/HDDS-6614
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
>
> We are stressing an Ozone cluster with heavy EC writes and reads, and we get
> one and only one exception from one DN:
> {code:java}
> 2022-04-14 15:02:04,422 [ChunkReader-46] INFO
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation:
> WriteChunk , Trace ID: , Message: Chunk file offset 1048576 does not match
> blockFile length 0 , Result: CHUNK_FILE_INCONSISTENCY ,
> StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> Chunk file offset 1048576 does not match blockFile length 0
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.validateChunkSize(ChunkUtils.java:391)
> at
> org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.writeChunk(FilePerBlockStrategy.java:140)
> at
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.writeChunk(ChunkManagerDispatcher.java:73)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:705)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:224)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:188)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:307)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$14(HddsDispatcher.java:169)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
> at
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:57)
> at
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:50)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:255)
> at
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> at
> org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2022-04-14 15:02:04,441 [ChunkReader-46] WARN
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving
> container
> /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614
> to state UNHEALTHY from state:UNHEALTHY
> Trace:java.lang.Thread.getStackTrace(Thread.java:1559) {code}
> And this results in an UNHEALTHY container replica:
> {code:java}
> [ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin container
> info 614
> Container id: 614
> Pipeline id: bddd5494-af98-46b2-a4c3-6757f129d3d2
> Container State: CLOSED
> Datanodes: [...]
> Replicas: [State: CLOSED; ReplicaIndex: 7; Origin:
> d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad; Location:
> d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad/9.27.141.78,
> State: CLOSED; ReplicaIndex: 10; Origin:
> 45784b6a-06d7-4d76-b53e-1bc10a7e22ea; Location:
> 45784b6a-06d7-4d76-b53e-1bc10a7e22ea/9.27.142.97,
> State: CLOSED; ReplicaIndex: 9; Origin: 65024339-4831-43c0-9fc0-752f8999d9c4;
> Location: 65024339-4831-43c0-9fc0-752f8999d9c4/9.27.140.135,
> State: CLOSED; ReplicaIndex: 8; Origin: e963db9e-44ea-425c-92e3-bdb5305c3049;
> Location: e963db9e-44ea-425c-92e3-bdb5305c3049/9.27.142.51,
> State: CLOSED; ReplicaIndex: 6; Origin: b6115b17-827c-484f-b61a-5f5849c5dbdb;
> Location: b6115b17-827c-484f-b61a-5f5849c5dbdb/9.27.141.121,
> State: UNHEALTHY; ReplicaIndex: 12; Origin:
> ae31f944-b6c9-4e18-b677-0b0edb2a8e13; Location:
> ae31f944-b6c9-4e18-b677-0b0edb2a8e13/9.27.141.78,
> State: CLOSED; ReplicaIndex: 2; Origin: 4a6af5ae-097a-4daa-9390-b0fbe87446f1;
> Location: 4a6af5ae-097a-4daa-9390-b0fbe87446f1/9.27.144.159,
> State: CLOSED; ReplicaIndex: 3; Origin: 3c3bc79e-3b5f-4f73-b502-7778a7993db6;
> Location: 3c3bc79e-3b5f-4f73-b502-7778a7993db6/9.27.140.38,
> State: CLOSED; ReplicaIndex: 14; Origin:
> 37db964c-0663-4b7c-915c-eaffccc32f55; Location:
> 37db964c-0663-4b7c-915c-eaffccc32f55/9.27.142.51,
> State: CLOSED; ReplicaIndex: 11; Origin:
> dc262c7c-184e-4f66-af98-2ece21deeab1; Location:
> dc262c7c-184e-4f66-af98-2ece21deeab1/9.27.140.222,
> State: CLOSED; ReplicaIndex: 13; Origin:
> e7ea47c7-6d33-490a-9d3a-3c8231c81e30; Location:
> e7ea47c7-6d33-490a-9d3a-3c8231c81e30/9.27.144.169,
> State: CLOSED; ReplicaIndex: 1; Origin: 8333d554-2e39-4ef6-b73a-01539e82a9d1;
> Location: 8333d554-2e39-4ef6-b73a-01539e82a9d1/9.27.144.226,
> State: CLOSED; ReplicaIndex: 4; Origin: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae;
> Location: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae/9.27.141.230,
> State: CLOSED; ReplicaIndex: 5; Origin: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa;
> Location: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa/9.27.140.135] {code}
> From the log, we could see that a block file received a write request at
> offset 1048576 while the block file is of length 0.
> This should be considered insane because EC client write stripes one by one,
> it doesn't send the next stripe chunks to DNs before we get success responses
> from DNs for the previous stripe.
> So I suspect that the problem lies in the check of the DN itself.
> {code:java}
> public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
> throws StorageContainerException {
> long offset = chunkInfo.getOffset();
> long len = chunkFile.length();
> if (chunkFile.length() != offset) {
> throw new StorageContainerException(
> "Chunk file offset " + offset + " does not match blockFile length " +
> len, CHUNK_FILE_INCONSISTENCY);
> }
> } {code}
> Here we get the length of the existing block file by using
> `chunkFile.length()` which is suspicious since we do write to block files by
> using FileChannels, so I think we should get the file size by the channel too
> because the channel may haven't create the block file yet when we do
> `chunkFile.length()`, thus we may get a 0. This may be the cause of this
> problem.
> It is hard to reproduce this problem with simple tests, it only reveals on
> heavy loads, but I did a small experiment to show the different between
> `fileChannel.size()` vs `file.length()`, the small program could be seen in
> the attachments.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]