Sammi Chen created HDDS-5548:
--------------------------------
Summary: Keep downloaded .gz.tar container file for debug purpose
Key: HDDS-5548
URL: https://issues.apache.org/jira/browse/HDDS-5548
Project: Apache Ozone
Issue Type: Improvement
Reporter: Sammi Chen
Assignee: Sammi Chen
There are a lot of container import failure LOGs in production, such as,
2021-08-03 21:48:12,311 [ContainerReplicationThread-9] INFO
org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
Starting replication of container 66315 from
[4e613295-6d55-4bf9-bdc9-1668fd24741c{ip: 11.61.44.244, host: 11.61.44.244,
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
STANDALONE=9859], networkLocation: /rack582702, certSerialId: null,
persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0},
7694e208-c887-4d8e-b249-28a176b4d7b7{ip: 11.61.45.38, host: 11.61.45.38, ports:
[REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
STANDALONE=9859], networkLocation: /rack582788, certSerialId: null,
persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}]
2021-08-03 21:48:17,462 [grpc-default-executor-12557] INFO
org.apache.hadoop.ozone.container.replication.GrpcReplicationClient: Container
66315 is downloaded to
/data/ozoneadmin/ozoneenv/ozone-temp/container-66315.tar.gz
2021-08-03 21:48:17,462 [ContainerReplicationThread-9] INFO
org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
Container 66315 is downloaded with size 6154503, starting to import.
2021-08-03 21:48:17,582 [ContainerReplicationThread-9] ERROR
org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
Container 66315 replication was unsuccessful.
java.io.IOException: Container descriptor is missing from the container archive.
at
org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.unpackContainerDescriptor(TarContainerPacker.java:190)
at
org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:76)
at
org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:125)
at
org.apache.hadoop.ozone.container.replication.MeasuredReplicator.replicate(MeasuredReplicator.java:69)
at
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:139)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-08-03 21:48:17,582 [ContainerReplicationThread-9] ERROR
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: Container
66315 can't be downloaded from any of the datanodes.
In the above case, 66315 container on the source datanode actually has the
Container descriptor on disk. So what's the root cause of this error is in
doubt.
This task is to keep the downloaded tar file for investigation purpose at the
cost of storage space.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]