[
https://issues.apache.org/jira/browse/HDDS-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505112#comment-17505112
]
Ethan Rose commented on HDDS-5548:
----------------------------------
Hi [~Sammi], the issues reported in these log messages has been fixed in
HDDS-6235, although we are still not keeping the failed imports for debugging.
If you think keeping the failed imports is still important we can leave this
Jira open, otherwise we can close it in favor of HDDS-2635.
> Keep downloaded container .gz.tar file for debug purpose
> --------------------------------------------------------
>
> Key: HDDS-5548
> URL: https://issues.apache.org/jira/browse/HDDS-5548
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
>
> There are a lot of container import failure LOGs in production, such as,
> 2021-08-03 21:48:12,311 [ContainerReplicationThread-9] INFO
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
> Starting replication of container 66315 from
> [4e613295-6d55-4bf9-bdc9-1668fd24741c{ip: 11.61.44.244, host: 11.61.44.244,
> ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
> STANDALONE=9859], networkLocation: /rack582702, certSerialId: null,
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0},
> 7694e208-c887-4d8e-b249-28a176b4d7b7{ip: 11.61.45.38, host: 11.61.45.38,
> ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
> STANDALONE=9859], networkLocation: /rack582788, certSerialId: null,
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}]
> 2021-08-03 21:48:17,462 [grpc-default-executor-12557] INFO
> org.apache.hadoop.ozone.container.replication.GrpcReplicationClient:
> Container 66315 is downloaded to
> /data/ozoneadmin/ozoneenv/ozone-temp/container-66315.tar.gz
> 2021-08-03 21:48:17,462 [ContainerReplicationThread-9] INFO
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
> Container 66315 is downloaded with size 6154503, starting to import.
> 2021-08-03 21:48:17,582 [ContainerReplicationThread-9] ERROR
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
> Container 66315 replication was unsuccessful.
> java.io.IOException: Container descriptor is missing from the container
> archive.
> at
> org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.unpackContainerDescriptor(TarContainerPacker.java:190)
> at
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:76)
> at
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:125)
> at
> org.apache.hadoop.ozone.container.replication.MeasuredReplicator.replicate(MeasuredReplicator.java:69)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:139)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2021-08-03 21:48:17,582 [ContainerReplicationThread-9] ERROR
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor:
> Container 66315 can't be downloaded from any of the datanodes.
> In the above case, 66315 container on the source datanode actually has the
> Container descriptor on disk. So what's the root cause of this error is in
> doubt.
> This task is to keep the downloaded tar file for investigation purpose at the
> cost of storage space.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]