[
https://issues.apache.org/jira/browse/HDDS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ritesh Shukla reassigned HDDS-10652:
------------------------------------
Assignee: Hemant Kumar
> [Upgrade][EC] Reconstruction failing with "java.io.IOException: None of the
> block data have checksum"
> -----------------------------------------------------------------------------------------------------
>
> Key: HDDS-10652
> URL: https://issues.apache.org/jira/browse/HDDS-10652
> Project: Apache Ozone
> Issue Type: Bug
> Components: EC, ECOfflineRecovery
> Reporter: Pratyush Bhatt
> Assignee: Hemant Kumar
> Priority: Major
>
> {color:#172b4d}*Upgrade versions:*
> Pre upgrade hash:
> [https://github.com/apache/ozone/commit/6ee6c357678676661ebb3181a56622c79b487bc1]
> Post upgrade Hash:
> [https://github.com/apache/ozone/commit/46b6f3def1d84ca769affb4d3f0d84dece6e8567]
> {color}{color:#172b4d}*Scenario:*
> Write a EC file(5GB) RS-3-2-1024K policy(in this case) before upgrade, after
> upgrade, shut down either 2 Parity nodes(this case) or 2 Data nodes, as the
> policy supports tolerating 2 DN failure. Check if reconstruction happens
> after sometime.
> *Observed Behavior:*
> 1. Data was successfully written pre-upgrade using Freon.
> File name:
> _o3://ozone1711558189/ec-construct-vol/ec-construct-buck/ec-construction/0_
> 2. Post upgrade Stop two of the DNs, in this case the Parity nodes that we
> obtained from one of the containers that was storing the above file's
> data.{color}
> {code:java}
> ozone admin container info 1004 --json
> 2024-03-27 21:35:15,065|INFO|MainThread|machine.py:232 -
> run()||GUID=183f2d10-e3a7-407f-adb5-b87f3e3af53b|Exit Code: 0
> 2024-03-27 21:35:15,098|INFO|MainThread|ozone.py:723 -
> find_ec_data_parity_hosts()|parity hosts: ['DN-4', 'DN-3']
> 2024-03-27 21:35:15,098|INFO|MainThread|ozone.py:724 -
> find_ec_data_parity_hosts()|data hosts: ['DN-8', 'DN-5', 'DN-1'] {code}
> {code:java}
> 2024-03-27 21:35:15,311|INFO|MainThread|cm_apilib.py:1214 -
> stopComponent()|Initiating stop of OZONE_DATANODE at host DN-4
> 2024-03-27 21:35:15,349|INFO|MainThread|cm_apilib.py:1218 -
> stopComponent()|Command name = Stop , ID = 2860
> 2024-03-27 21:35:15,580|INFO|MainThread|cm_apilib.py:1214 -
> stopComponent()|Initiating stop of OZONE_DATANODE at host DN-3
> 2024-03-27 21:35:15,609|INFO|MainThread|cm_apilib.py:1218 -
> stopComponent()|Command name = Stop , ID = 2862 {code}
> {color:#172b4d}Node DN-3 and DN-4 are stopped.
> 3. Read file's data(Online Reconstruction) and compute checksum, -> That
> matched.
> 4. Wait for Reconstruction to happen, test waited for 20 Minutes, but Still
> only 3 DNs were present even after 20 minutes:{color}
> {code:java}
> ['DN-5', 'DN-1', 'DN-8']{code}
> Infact still after 10 hours(At the time of writing), there are still 3 DNs
> only:
> {code:java}
> date
> Thu Mar 28 08:39:16 UTC 2024
> ozone admin container info 1004 --json
> {
> "containerInfo" : {
> "state" : "CLOSED",
> "stateEnterTime" : "2024-03-27T18:43:51.934Z",
> "replicationConfig" : {
> "data" : 3,
> "parity" : 2,
> "ecChunkSize" : 1048576,
> "codec" : "RS",
> "requiredNodes" : 5,
> "replicationType" : "EC"
> },
> "usedBytes" : 1342177280,
> "numberOfKeys" : 5,
> "lastUsed" : "2024-03-28T08:39:24.535189Z",
> "owner" : "om1",
> "containerID" : 1004,
> "deleteTransactionId" : 0,
> "sequenceId" : 0,
> "deleted" : false,
> "open" : false
> },
> "pipeline" : {
> "id" : {
> "id" : "73532c14-40ac-4924-9353-2f18ab0d63f2"
> },
> "replicationConfig" : {
> "data" : 3,
> "parity" : 2,
> "ecChunkSize" : 1048576,
> "codec" : "RS",
> "requiredNodes" : 5,
> "replicationType" : "EC"
> },
> "nodesInOrder" : [ {
> "level" : 0,
> "cost" : 0,
> "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "ipAddress" : "10.140.37.12",
> "hostName" : "DN-5",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -662262523,
> "networkLocation" : "/default",
> "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
> "numOfLeaves" : 1
> }, {
> "level" : 0,
> "cost" : 0,
> "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "ipAddress" : "10.140.40.9",
> "hostName" : "DN-1",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -1387859873,
> "networkLocation" : "/default",
> "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "numOfLeaves" : 1
> }, {
> "level" : 0,
> "cost" : 0,
> "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "ipAddress" : "10.140.137.128",
> "hostName" : "DN-8",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : 1098159392,
> "networkLocation" : "/default",
> "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "numOfLeaves" : 1
> } ],
> "creationTimestamp" : "2024-03-28T08:39:24.480Z",
> "stateEnterTime" : "2024-03-28T08:39:24.545517Z",
> "leaderNode" : {
> "level" : 0,
> "cost" : 0,
> "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "ipAddress" : "10.140.37.12",
> "hostName" : "DN-5",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -662262523,
> "networkLocation" : "/default",
> "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
> "numOfLeaves" : 1
> },
> "firstNode" : {
> "level" : 0,
> "cost" : 0,
> "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "ipAddress" : "10.140.37.12",
> "hostName" : "DN-5",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -662262523,
> "networkLocation" : "/default",
> "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
> "numOfLeaves" : 1
> },
> "closestNode" : {
> "level" : 0,
> "cost" : 0,
> "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "ipAddress" : "10.140.37.12",
> "hostName" : "DN-5",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -662262523,
> "networkLocation" : "/default",
> "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
> "numOfLeaves" : 1
> },
> "allocationTimeout" : false,
> "healthy" : true,
> "pipelineState" : "ALLOCATED",
> "nodes" : [ {
> "level" : 0,
> "cost" : 0,
> "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "ipAddress" : "10.140.37.12",
> "hostName" : "DN-5",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -662262523,
> "networkLocation" : "/default",
> "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
> "numOfLeaves" : 1
> }, {
> "level" : 0,
> "cost" : 0,
> "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "ipAddress" : "10.140.40.9",
> "hostName" : "DN-1",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -1387859873,
> "networkLocation" : "/default",
> "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "numOfLeaves" : 1
> }, {
> "level" : 0,
> "cost" : 0,
> "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "ipAddress" : "10.140.137.128",
> "hostName" : "DN-8",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : 1098159392,
> "networkLocation" : "/default",
> "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "numOfLeaves" : 1
> } ],
> "empty" : false,
> "type" : "EC"
> },
> "replicas" : [ {
> "containerID" : 1004,
> "state" : "CLOSED",
> "datanodeDetails" : {
> "level" : 0,
> "cost" : 0,
> "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "ipAddress" : "10.140.37.12",
> "hostName" : "DN-5z",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -662262523,
> "networkLocation" : "/default",
> "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
> "numOfLeaves" : 1
> },
> "placeOfBirth" : "6179347f-5824-41d4-b722-f1dbc5f14880",
> "sequenceId" : 0,
> "keyCount" : 5,
> "bytesUsed" : 1342177280,
> "replicaIndex" : 2
> }, {
> "containerID" : 1004,
> "state" : "CLOSED",
> "datanodeDetails" : {
> "level" : 0,
> "cost" : 0,
> "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "ipAddress" : "10.140.40.9",
> "hostName" : "DN-1",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : -1387859873,
> "networkLocation" : "/default",
> "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "numOfLeaves" : 1
> },
> "placeOfBirth" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
> "sequenceId" : 0,
> "keyCount" : 5,
> "bytesUsed" : 1342177280,
> "replicaIndex" : 3
> }, {
> "containerID" : 1004,
> "state" : "CLOSED",
> "datanodeDetails" : {
> "level" : 0,
> "cost" : 0,
> "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "ipAddress" : "10.140.137.128",
> "hostName" : "DN-8",
> "ports" : [ {
> "name" : "HTTPS",
> "value" : 9883
> }, {
> "name" : "CLIENT_RPC",
> "value" : 9864
> }, {
> "name" : "REPLICATION",
> "value" : 9886
> }, {
> "name" : "RATIS",
> "value" : 9858
> }, {
> "name" : "RATIS_ADMIN",
> "value" : 9857
> }, {
> "name" : "RATIS_SERVER",
> "value" : 9856
> }, {
> "name" : "STANDALONE",
> "value" : 9859
> } ],
> "setupTime" : 0,
> "persistedOpState" : "IN_SERVICE",
> "persistedOpStateExpiryEpochSec" : 0,
> "initialVersion" : 0,
> "currentVersion" : 1,
> "decommissioned" : false,
> "maintenance" : false,
> "signature" : 1098159392,
> "networkLocation" : "/default",
> "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
> "numOfLeaves" : 1
> },
> "placeOfBirth" : "711656cf-a99e-4b2c-8c35-f015ee94889c",
> "sequenceId" : 0,
> "keyCount" : 5,
> "bytesUsed" : 1342177280,
> "replicaIndex" : 1
> } ]
> } {code}
> Checked the SCM Logs, it is still sending reconstructECContainersCommand,
> {code:java}
> 2024-03-28 08:36:56,748 INFO [Under Replicated
> Processor]-org.apache.hadoop.hdds.scm.container.replication.ReplicationManager:
> Sending command [reconstructECContainersCommand: containerID: 1004,
> replicationConfig: EC{rs-3-2-1024k}, sources:
> [ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128) replicaIndex: 1,
> 6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12) replicaIndex: 2,
> d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9) replicaIndex: 3],
> targets: [572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130),
> 711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)], missingIndexes:
> [4, 5]] for container ContainerInfo{id=#1004, state=CLOSED,
> stateEnterTime=2024-03-27T18:43:51.934Z,
> pipelineID=PipelineID=53f5587f-9e6c-465d-a0cb-b82d10c227d3, owner=om1} to
> 572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130) with datanode
> deadline 1711615886747 and scm deadline 1711615916747 {code}
> Checked one of the Target DN DN-7, its throwing below warnings.
> {code:java}
> 2024-03-28 08:37:14,982 WARN
> [ContainerReplicationThread-5]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
> FAILED reconstructECContainersCommand: containerID=1004,
> replication=rs-3-2-1024k, missingIndexes=[4, 5],
> sources={1=ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128),
> 2=6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12),
> 3=d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9)},
> targets={4=572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130),
> 5=711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)} after 10639 ms
> java.io.IOException: None of the block data have checksum which means
> 2(parity)+1 blocks are not present
> at
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:156)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:325)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:171)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2024-03-28 08:37:14,982 WARN
> [ContainerReplicationThread-5]-org.apache.hadoop.ozone.container.replication.ReplicationSupervisor:
> Failed FAILED reconstructECContainersCommand: containerID=1004,
> replication=rs-3-2-1024k, missingIndexes=[4, 5],
> sources={1=ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128),
> 2=6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12),
> 3=d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9)},
> targets={4=572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130),
> 5=711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)} {code}
> *Expected Behavior:* Reconstruction should have happened
> Note: This is fairly reproducible everytime.
> cc: [~siddhant]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]