[
https://issues.apache.org/jira/browse/HDDS-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sammi Chen updated HDDS-5548:
-----------------------------
Summary: Keep downloaded container .gz.tar file for debug purpose (was:
Keep downloaded .gz.tar container file for debug purpose)
> Keep downloaded container .gz.tar file for debug purpose
> --------------------------------------------------------
>
> Key: HDDS-5548
> URL: https://issues.apache.org/jira/browse/HDDS-5548
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
>
> There are a lot of container import failure LOGs in production, such as,
> 2021-08-03 21:48:12,311 [ContainerReplicationThread-9] INFO
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
> Starting replication of container 66315 from
> [4e613295-6d55-4bf9-bdc9-1668fd24741c{ip: 11.61.44.244, host: 11.61.44.244,
> ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
> STANDALONE=9859], networkLocation: /rack582702, certSerialId: null,
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0},
> 7694e208-c887-4d8e-b249-28a176b4d7b7{ip: 11.61.45.38, host: 11.61.45.38,
> ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
> STANDALONE=9859], networkLocation: /rack582788, certSerialId: null,
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}]
> 2021-08-03 21:48:17,462 [grpc-default-executor-12557] INFO
> org.apache.hadoop.ozone.container.replication.GrpcReplicationClient:
> Container 66315 is downloaded to
> /data/ozoneadmin/ozoneenv/ozone-temp/container-66315.tar.gz
> 2021-08-03 21:48:17,462 [ContainerReplicationThread-9] INFO
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
> Container 66315 is downloaded with size 6154503, starting to import.
> 2021-08-03 21:48:17,582 [ContainerReplicationThread-9] ERROR
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
> Container 66315 replication was unsuccessful.
> java.io.IOException: Container descriptor is missing from the container
> archive.
> at
> org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.unpackContainerDescriptor(TarContainerPacker.java:190)
> at
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:76)
> at
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:125)
> at
> org.apache.hadoop.ozone.container.replication.MeasuredReplicator.replicate(MeasuredReplicator.java:69)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:139)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2021-08-03 21:48:17,582 [ContainerReplicationThread-9] ERROR
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor:
> Container 66315 can't be downloaded from any of the datanodes.
> In the above case, 66315 container on the source datanode actually has the
> Container descriptor on disk. So what's the root cause of this error is in
> doubt.
> This task is to keep the downloaded tar file for investigation purpose at the
> cost of storage space.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]