[
https://issues.apache.org/jira/browse/HDDS-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921525#comment-17921525
]
Tsz-wo Sze edited comment on HDDS-12103 at 1/27/25 8:14 PM:
------------------------------------------------------------
The docker-compose files in ozonesecure and ozonesecure-mr are slightly
different. I suspect the MR acceptance test is configured incorrectly.
{code}
$pwd
/Users/szetszwo/ozone/ozone-fork/hadoop-ozone/dist/src/main/compose
stw2019mac:compose$diff ozonesecure/docker-compose.yaml
ozonesecure-mr/docker-compose.yaml
28d27
< hostname: kms
31c30
< - 9600:9600
---
> - 9600:9600
33,35c32
< - ./docker-config
< environment:
< HADOOP_CONF_DIR: /opt/hadoop/etc/hadoop
---
> - ./docker-config
36a34
> - ./krb5.conf:/etc/krb5.conf
41d38
< hostname: dn
67c64
< OZONE_OPTS: -Dcom.sun.net.ssl.checkRevocation=false
---
> OZONE_OPTS:
71,86d67
< httpfs:
< image: ${OZONE_RUNNER_IMAGE}:${OZONE_RUNNER_VERSION}
< hostname: httpfs
< dns_search: .
< volumes:
< - ../..:/opt/hadoop
< - ../_keytabs:/etc/security/keytabs
< - ./krb5.conf:/etc/krb5.conf
< ports:
< - 14000:14000
< env_file:
< - ./docker-config
< command: [ "/opt/hadoop/bin/ozone","httpfs" ]
< environment:
< OZONE-SITE.XML_hdds.scm.safemode.min.datanode:
${OZONE_SAFEMODE_MIN_DATANODES:-1}
< OZONE_OPTS:
99d79
< command: ["/opt/hadoop/bin/ozone","s3g",
"-Dozone.om.transport.class=${OZONE_S3_OM_TRANSPORT:-org.apache.hadoop.ozone.om.protocolPB.GrpcOmTransportFactory}"]
102,116c82
< recon:
< image: ${OZONE_RUNNER_IMAGE}:${OZONE_RUNNER_VERSION}
< hostname: recon
< dns_search: .
< volumes:
< - ../..:/opt/hadoop
< - ../_keytabs:/etc/security/keytabs
< - ./krb5.conf:/etc/krb5.conf
< ports:
< - 9888:9888
< env_file:
< - ./docker-config
< environment:
< OZONE_OPTS:
< command: ["/opt/hadoop/bin/ozone","recon"]
---
> command: ["/opt/hadoop/bin/ozone","s3g"]
{code}
[~adoroszlai], do you know if the docker-compose files are supposed to be the
same?
was (Author: szetszwo):
The docker-compose files in ozonesecure and ozonesecure-mr are slightly
different. I suspect the MR acceptance test is configured incorrectly.
{code}
$pwd
/Users/szetszwo/ozone/ozone-fork/hadoop-ozone/dist/src/main/compose
stw2019mac:compose$diff ozonesecure/docker-compose.yaml
ozonesecure-mr/docker-compose.yaml
28d27
< hostname: kms
31c30
< - 9600:9600
---
> - 9600:9600
33,35c32
< - ./docker-config
< environment:
< HADOOP_CONF_DIR: /opt/hadoop/etc/hadoop
---
> - ./docker-config
36a34
> - ./krb5.conf:/etc/krb5.conf
41d38
< hostname: dn
67c64
< OZONE_OPTS: -Dcom.sun.net.ssl.checkRevocation=false
---
> OZONE_OPTS:
71,86d67
< httpfs:
< image: ${OZONE_RUNNER_IMAGE}:${OZONE_RUNNER_VERSION}
< hostname: httpfs
< dns_search: .
< volumes:
< - ../..:/opt/hadoop
< - ../_keytabs:/etc/security/keytabs
< - ./krb5.conf:/etc/krb5.conf
< ports:
< - 14000:14000
< env_file:
< - ./docker-config
< command: [ "/opt/hadoop/bin/ozone","httpfs" ]
< environment:
< OZONE-SITE.XML_hdds.scm.safemode.min.datanode:
${OZONE_SAFEMODE_MIN_DATANODES:-1}
< OZONE_OPTS:
99d79
< command: ["/opt/hadoop/bin/ozone","s3g",
"-Dozone.om.transport.class=${OZONE_S3_OM_TRANSPORT:-org.apache.hadoop.ozone.om.protocolPB.GrpcOmTransportFactory}"]
102,116c82
< recon:
< image: ${OZONE_RUNNER_IMAGE}:${OZONE_RUNNER_VERSION}
< hostname: recon
< dns_search: .
< volumes:
< - ../..:/opt/hadoop
< - ../_keytabs:/etc/security/keytabs
< - ./krb5.conf:/etc/krb5.conf
< ports:
< - 9888:9888
< env_file:
< - ./docker-config
< environment:
< OZONE_OPTS:
< command: ["/opt/hadoop/bin/ozone","recon"]
---
> command: ["/opt/hadoop/bin/ozone","s3g"]
{code}
> PutBlock timeouts in MapReduce test with Ratis 3.1.3
> ----------------------------------------------------
>
> Key: HDDS-12103
> URL: https://issues.apache.org/jira/browse/HDDS-12103
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Attila Doroszlai
> Priority: Major
>
> MapReduce tests are much slower with Ratis 3.1.3, frequently hitting test
> timeout (even after increase from 4 to 6 minutes). Even successful tests are
> much slower. Other tests do not indicate similar slowness.
> MapReduce job log shows some PutBlock request timeouts:
> {code}
> 2025-01-18 12:59:31 ERROR OrderedAsync:215 - client-6A858158D10F: Failed*
> RaftClientRequest:client-6A858158D10F->6113f37b-d1d0-4ce0-803d-1921ebe30b67@group-10CFDA178973,
> cid=12, seq=9, RW, cmdType: PutBlock
> traceID: ""
> containerID: 1
> datanodeUuid: "d05682f8-babd-4570-8aec-e536a6edcb1d"
> putBlock {
> blockData {
> blockID {
> containerID: 1
> localID: 115816896921600024
> blockCommitSequenceId: 0
> }
> metadata {
> key: "TYPE"
> value: "KEY"
> }
> chunks {
> chunkName: "115816896921600024_chunk_1"
> offset: 0
> len: 179924
> checksumData {
> type: CRC32
> bytesPerChecksum: 16384
> checksums: ...
> }
> }
> }
> eof: true
> }
> version: 3
> , data.size=0
> java.util.concurrent.CompletionException:
> org.apache.ratis.protocol.exceptions.TimeoutIOException:
> client-6A858158D10F->6113f37b-d1d0-4ce0-803d-1921ebe30b67 request #12 timeout
> 60s
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at
> java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647)
> at
> java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$5(GrpcClientProtocolClient.java:376)
> at java.util.Optional.ifPresent(Optional.java:159)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:381)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:376)
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$3(GrpcClientProtocolClient.java:369)
> at
> org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
> at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:78)
> at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> Caused by: org.apache.ratis.protocol.exceptions.TimeoutIOException:
> client-6A858158D10F->6113f37b-d1d0-4ce0-803d-1921ebe30b67 request #12 timeout
> 60s
> at
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$5(GrpcClientProtocolClient.java:377)
> ... 10 more
> {code}
> CC [~szetszwo]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]