[ 
https://issues.apache.org/jira/browse/HDDS-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401690#comment-17401690
 ] 

Attila Doroszlai edited comment on HDDS-3907 at 8/19/21, 1:36 PM:
------------------------------------------------------------------

This is still happening, see 
https://github.com/elek/ozone-build-results/tree/master/2021/08/19/9810/acceptance-secure
 for logs.

{code:title=https://github.com/apache/ozone/runs/3368353893#step:5:126}
Start freon testing                                                   | FAIL |
{code}

{code:title=robot log.html}
07:19:23.258    INFO    Running command 'ozone freon randomkeys 
--num-of-volumes 5 --num-of-buckets 5 --num-of-keys 5 --num-of-threads 1 
--replication-type RATIS --factor THREE --validate-writes 2>&1'.       
07:24:23.225    FAIL    Test timeout 5 minutes exceeded.
{code}

{code}
datanode_3  | 2021-08-19 05:20:09,598 
[java.util.concurrent.ThreadPoolExecutor$Worker@5f5ccab7[State = -1, empty 
queue]] WARN server.GrpcLogAppender: 
1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899->25dd9de7-1caa-448d-a35a-2b29afced1cc-GrpcLogAppender:
  appendEntries Timeout, 
request=AppendEntriesRequest:cid=8,entriesCount=1,lastEntry=(t:3, i:0)
...
datanode_3  | 2021-08-19 05:23:56,577 [Thread-181] INFO 
client.GrpcClientProtocolService: Failed 
RaftClientRequest:client-14C4D4C86555->1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899,
 cid=102, seq=0, Watch-ALL_COMMITTED(131), Message:<EMPTY>, 
reply=RaftClientReply:client-14C4D4C86555->1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899,
 cid=102, FAILED org.apache.ratis.protocol.exceptions.NotReplicatedException: 
Request with call Id 102 and log index 131 is not yet replicated to 
ALL_COMMITTED, logIndex=131, commits[1c7f86b2-ded3-441b-9f20-84ba3ff60d2d:c132, 
64230e6f-d613-4ced-8084-22c404c29d15:c132, 
25dd9de7-1caa-448d-a35a-2b29afced1cc:c127]
{code}

{code}
datanode_2  | 2021-08-19 05:18:42,242 [Command processor thread] WARN 
commandhandler.CreatePipelineCommandHandler: Add group failed for 
1c7f86b2-ded3-441b-9f20-84ba3ff60d2d{ip: 172.18.0.9, host: 
ozonesecure_datanode_3.ozonesecure_default, ports: [REPLICATION=9886, 
RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], 
networkLocation: /default-rack, certSerialId: null, persistedOpState: 
IN_SERVICE, persistedOpStateExpiryEpochSec: 0}
datanode_2  | java.io.IOException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: 
Network closed for unknown reason
{code}


was (Author: adoroszlai):
This is still happening, see 
https://github.com/elek/ozone-build-results/tree/master/2021/08/19/9810/acceptance-secure
 for logs.

{code:title=https://github.com/apache/ozone/runs/3368353893#step:5:126}
Start freon testing                                                   | FAIL |
{code}

{code}
datanode_3  | 2021-08-19 05:20:09,598 
[java.util.concurrent.ThreadPoolExecutor$Worker@5f5ccab7[State = -1, empty 
queue]] WARN server.GrpcLogAppender: 
1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899->25dd9de7-1caa-448d-a35a-2b29afced1cc-GrpcLogAppender:
  appendEntries Timeout, 
request=AppendEntriesRequest:cid=8,entriesCount=1,lastEntry=(t:3, i:0)
...
datanode_3  | 2021-08-19 05:23:56,577 [Thread-181] INFO 
client.GrpcClientProtocolService: Failed 
RaftClientRequest:client-14C4D4C86555->1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899,
 cid=102, seq=0, Watch-ALL_COMMITTED(131), Message:<EMPTY>, 
reply=RaftClientReply:client-14C4D4C86555->1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899,
 cid=102, FAILED org.apache.ratis.protocol.exceptions.NotReplicatedException: 
Request with call Id 102 and log index 131 is not yet replicated to 
ALL_COMMITTED, logIndex=131, commits[1c7f86b2-ded3-441b-9f20-84ba3ff60d2d:c132, 
64230e6f-d613-4ced-8084-22c404c29d15:c132, 
25dd9de7-1caa-448d-a35a-2b29afced1cc:c127]
{code}

{code}
datanode_2  | 2021-08-19 05:18:42,242 [Command processor thread] WARN 
commandhandler.CreatePipelineCommandHandler: Add group failed for 
1c7f86b2-ded3-441b-9f20-84ba3ff60d2d{ip: 172.18.0.9, host: 
ozonesecure_datanode_3.ozonesecure_default, ports: [REPLICATION=9886, 
RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], 
networkLocation: /default-rack, certSerialId: null, persistedOpState: 
IN_SERVICE, persistedOpStateExpiryEpochSec: 0}
datanode_2  | java.io.IOException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: 
Network closed for unknown reason
{code}

> Intermittent failure in writing data in acceptance test
> -------------------------------------------------------
>
>                 Key: HDDS-3907
>                 URL: https://issues.apache.org/jira/browse/HDDS-3907
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Marton Elek
>            Priority: Blocker
>
> Examples:
> https://github.com/elek/ozone-build-results/tree/master/2020/06/30/1318/acceptance
> https://github.com/elek/ozone-build-results/tree/master/2020/06/30/1321/acceptance
> https://github.com/elek/ozone-build-results/tree/master/2020/06/30/1334/acceptance
> Some strange errors:
> {code}
> scm_1         | 2020-06-30 19:17:50,787 [RatisPipelineUtilsThread] ERROR 
> pipeline.SCMPipelineManager: Failed to create pipeline of type RATIS and 
> factor ONE. Exception: Cannot create pipeline of factor 1 using 0 nodes. Used 
> 6 nodes. Healthy nodes 6
> scm_1         | 2020-06-30 19:17:50,788 [RatisPipelineUtilsThread] ERROR 
> pipeline.SCMPipelineManager: Failed to create pipeline of type RATIS and 
> factor THREE. Exception: Pipeline creation failed because nodes are engaged 
> in other pipelines and every node can only be engaged in max 2 pipelines. 
> Required 3. Found 0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to