[
https://issues.apache.org/jira/browse/RATIS-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17256778#comment-17256778
]
runzhiwang edited comment on RATIS-1277 at 12/31/20, 1:50 AM:
--------------------------------------------------------------
Start 3 servers:
{code:java}
BIN=ratis-examples/src/main/bin
PEERS=n0:ip1:6000:7000,n1:ip2:6001:7001,n2:ip3:6002:7002
nohup ${BIN}/server.sh filestore server --id n0 --storage /data/ratis/n0
--storage /data1/ratis/n0 --storage /data2/ratis/n0 --storage /data3/ratis/n0
--storage /data4/ratis/n0 --storage /data5/ratis/n0 --storage /data6/ratis/n0
--storage /data7/ratis/n0 --storage /data8/ratis/n0 --storage /data9/ratis/n0
--storage /data10/ratis/n0 --storage /data11/ratis/n0 --peers ${PEERS}
--writeThreadNum 100 --readThreadNum 100 --commitThreadNum 20 --deleteThreadNum
20 >> n0.log 2>&1 &
nohup ${BIN}/server.sh filestore server --id n1 --storage /data/ratis/n1
--storage /data1/ratis/n1 --storage /data2/ratis/n1 --storage /data3/ratis/n1
--storage /data4/ratis/n1 --storage /data5/ratis/n1 --storage /data6/ratis/n1
--storage /data7/ratis/n1 --storage /data8/ratis/n1 --storage /data9/ratis/n1
--storage /data10/ratis/n1 --storage /data11/ratis/n1 --peers ${PEERS}
--writeThreadNum 100 --readThreadNum 100 --commitThreadNum 20 --deleteThreadNum
20 >> n1.log 2>&1 &
nohup ${BIN}/server.sh filestore server --id n2 --storage /data/ratis/n2
--storage /data1/ratis/n2 --storage /data2/ratis/n2 --storage /data3/ratis/n2
--storage /data4/ratis/n2 --storage /data5/ratis/n2 --storage /data6/ratis/n2
--storage /data7/ratis/n2 --storage /data8/ratis/n2 --storage /data9/ratis/n2
--storage /data10/ratis/n2 --storage /data11/ratis/n2 --peers ${PEERS}
--writeThreadNum 100 --readThreadNum 100 --commitThreadNum 20 --deleteThreadNum
20 >> n2.log 2>&1 &
{code}
start 3 clients on 3 machines:
{code:java}
${BIN}/client.sh filestore datastream --size 128000000 --numFiles 600
--bufferSize 1000000 --syncSize 0 --type DirectByteBuffer --peers ${PEERS}
--storage /data/ratis/n2 --storage /data1/ratis/n2 --storage /data2/ratis/n2
--storage /data3/ratis/n2 --storage /data4/ratis/n2 --storage /data5/ratis/n2
--storage /data6/ratis/n2 --storage /data7/ratis/n2 --storage /data8/ratis/n2
--storage /data9/ratis/n2 --storage /data10/ratis/n2 --storage /data11/ratis/n2
{code}
Then it will reproduce, but it did not happen in my previous test, not sure why.
was (Author: yjxxtd):
Start 3 servers:
BIN=ratis-examples/src/main/bin
PEERS=n0:ip1:6000:7000,n1:ip2:6001:7001,n2:ip3:6002:7002
nohup ${BIN}/server.sh filestore server --id n0 --storage /data/ratis/n0
--storage /data1/ratis/n0 --storage /data2/ratis/n0 --storage /data3/ratis/n0
--storage /data4/ratis/n0 --storage /data5/ratis/n0 --storage /data6/ratis/n0
--storage /data7/ratis/n0 --storage /data8/ratis/n0 --storage /data9/ratis/n0
--storage /data10/ratis/n0 --storage /data11/ratis/n0 --peers ${PEERS}
--writeThreadNum 100 --readThreadNum 100 --commitThreadNum 20 --deleteThreadNum
20 >> n0.log 2>&1 &
nohup ${BIN}/server.sh filestore server --id n1 --storage /data/ratis/n1
--storage /data1/ratis/n1 --storage /data2/ratis/n1 --storage /data3/ratis/n1
--storage /data4/ratis/n1 --storage /data5/ratis/n1 --storage /data6/ratis/n1
--storage /data7/ratis/n1 --storage /data8/ratis/n1 --storage /data9/ratis/n1
--storage /data10/ratis/n1 --storage /data11/ratis/n1 --peers ${PEERS}
--writeThreadNum 100 --readThreadNum 100 --commitThreadNum 20 --deleteThreadNum
20 >> n1.log 2>&1 &
nohup ${BIN}/server.sh filestore server --id n2 --storage /data/ratis/n2
--storage /data1/ratis/n2 --storage /data2/ratis/n2 --storage /data3/ratis/n2
--storage /data4/ratis/n2 --storage /data5/ratis/n2 --storage /data6/ratis/n2
--storage /data7/ratis/n2 --storage /data8/ratis/n2 --storage /data9/ratis/n2
--storage /data10/ratis/n2 --storage /data11/ratis/n2 --peers ${PEERS}
--writeThreadNum 100 --readThreadNum 100 --commitThreadNum 20 --deleteThreadNum
20 >> n2.log 2>&1 &
start 3 clients on 3 machines:
${BIN}/client.sh filestore datastream --size 128000000 --numFiles 600
--bufferSize 1000000 --syncSize 0 --type DirectByteBuffer --peers ${PEERS}
--storage /data/ratis/n2 --storage /data1/ratis/n2 --storage /data2/ratis/n2
--storage /data3/ratis/n2 --storage /data4/ratis/n2 --storage /data5/ratis/n2
--storage /data6/ratis/n2 --storage /data7/ratis/n2 --storage /data8/ratis/n2
--storage /data9/ratis/n2 --storage /data10/ratis/n2 --storage /data11/ratis/n2
Then it will reproduce, but it did not happen in my previous test, not sure why.
> FileStore write failed because out of order
> -------------------------------------------
>
> Key: RATIS-1277
> URL: https://issues.apache.org/jira/browse/RATIS-1277
> Project: Ratis
> Issue Type: Sub-task
> Reporter: runzhiwang
> Assignee: runzhiwang
> Priority: Major
> Attachments: screenshot-2.png, screenshot-3.png
>
>
> !screenshot-3.png!
> As the following image and code shows, the code check the byteWritten of
> STREAM_HEADER, i.e. 0, equals to 10000, of course failed.
> !screenshot-2.png!
> {code:java}
> static boolean
> checkSuccessRemoteWrite(List<CompletableFuture<DataStreamReply>>
> replyFutures, long bytesWritten) {
> for (CompletableFuture<DataStreamReply> replyFuture : replyFutures) {
> final DataStreamReply reply = replyFuture.join();
> if (!reply.isSuccess() || reply.getBytesWritten() != bytesWritten) {
> + System.err.println("succ:" + reply.isSuccess() + " reply written:"
> + reply.getBytesWritten() +
> + " expected:" + bytesWritten + " clientId:" + reply.getClientId()
> + ",type:" + reply.getType() + ",streamId" +
> + reply.getStreamId() + ",offset:" + reply.getStreamOffset() +
> ",datalength:" + reply.getDataLength());
> return false;
> }
> }
> return true;
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)