[
https://issues.apache.org/jira/browse/HDDS-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uma Maheswara Rao G resolved HDDS-5679.
---------------------------------------
Fix Version/s: 1.2.1
Resolution: Fixed
> Use more defensive sizeRequired for replication manager for container
> replication.
> -----------------------------------------------------------------------------------
>
> Key: HDDS-5679
> URL: https://issues.apache.org/jira/browse/HDDS-5679
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.2.1
>
>
> We hit a bug when replicating a container of some size about 2GB <
> 5GB(container size):
> {code:java}
> // code placeholder
> 2021-08-25 19:12:31,945 [ContainerReplicationThread-4] ERROR
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator:
> Container 73446 replication was unsuccessful.
> org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The
> volume with the most available space (=2580881408 B) is less than the
> container size (=5368709120 B).
> at
> org.apache.hadoop.ozone.container.common.volume.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:77)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.populateContainerPathFields(KeyValueHandler.java:290)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.importContainer(KeyValueHandler.java:907)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.importContainer(ContainerController.java:139)
> at
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:90)
> at
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:135)
> at
> org.apache.hadoop.ozone.container.replication.MeasuredReplicator.replicate(MeasuredReplicator.java:69)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:139)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2021-08-25 19:12:31,946 [ContainerReplicationThread-4] ERROR
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor:
> Container 73446 can't be downloaded from any of the datanodes.
> {code}
> ReplicationManager will place the container replica to a datanode with enough
> space, but when datanode wants to create a container replica, it will check
> if whether there's at least 5GB(container size) left, so even that we have
> enough space for a container of 2GB, we will hit an out of space exception.
> In this case, RM should not schedule this replica to this datanode.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]