[ 
https://issues.apache.org/jira/browse/HDDS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai resolved HDDS-10652.
-------------------------------------
    Fix Version/s: 1.5.0
       Resolution: Fixed

> [Upgrade][EC] Reconstruction failing with "java.io.IOException: None of the 
> block data have checksum"
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-10652
>                 URL: https://issues.apache.org/jira/browse/HDDS-10652
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: EC, ECOfflineRecovery
>            Reporter: Pratyush Bhatt
>            Assignee: Siddhant Sangwan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.5.0
>
>
> {color:#172b4d}*Upgrade versions:*
> Pre upgrade hash: 
> [https://github.com/apache/ozone/commit/6ee6c357678676661ebb3181a56622c79b487bc1]
> Post upgrade Hash:
> [https://github.com/apache/ozone/commit/46b6f3def1d84ca769affb4d3f0d84dece6e8567]
> {color}{color:#172b4d}*Scenario:*
> Write a EC file(5GB) RS-3-2-1024K policy(in this case) before upgrade, after 
> upgrade, shut down either 2 Parity nodes(this case) or 2 Data nodes, as the 
> policy supports tolerating 2 DN failure. Check if reconstruction happens 
> after sometime.
> *Observed Behavior:*
> 1. Data was successfully written pre-upgrade using Freon. 
> File name: 
> _o3://ozone1711558189/ec-construct-vol/ec-construct-buck/ec-construction/0_
> 2. Post upgrade Stop two of the DNs, in this case the Parity nodes that we 
> obtained from one of the containers that was storing the above file's 
> data.{color}
> {code:java}
> ozone admin container info 1004 --json
> 2024-03-27 21:35:15,065|INFO|MainThread|machine.py:232 - 
> run()||GUID=183f2d10-e3a7-407f-adb5-b87f3e3af53b|Exit Code: 0
> 2024-03-27 21:35:15,098|INFO|MainThread|ozone.py:723 - 
> find_ec_data_parity_hosts()|parity hosts: ['DN-4', 'DN-3']
> 2024-03-27 21:35:15,098|INFO|MainThread|ozone.py:724 - 
> find_ec_data_parity_hosts()|data hosts: ['DN-8', 'DN-5', 'DN-1'] {code}
> {code:java}
> 2024-03-27 21:35:15,311|INFO|MainThread|cm_apilib.py:1214 - 
> stopComponent()|Initiating stop of OZONE_DATANODE at host DN-4
> 2024-03-27 21:35:15,349|INFO|MainThread|cm_apilib.py:1218 - 
> stopComponent()|Command name = Stop , ID = 2860  
> 2024-03-27 21:35:15,580|INFO|MainThread|cm_apilib.py:1214 - 
> stopComponent()|Initiating stop of OZONE_DATANODE at host DN-3
> 2024-03-27 21:35:15,609|INFO|MainThread|cm_apilib.py:1218 - 
> stopComponent()|Command name = Stop , ID = 2862  {code}
> {color:#172b4d}Node DN-3 and DN-4 are stopped.
> 3. Read file's data(Online Reconstruction) and compute checksum, -> That 
> matched.
> 4. Wait for Reconstruction to happen, test waited for 20 Minutes, but Still 
> only 3 DNs were present even after 20 minutes:{color}
> {code:java}
> ['DN-5', 'DN-1', 'DN-8']{code}
> Infact still after 10 hours(At the time of writing), there are still 3 DNs 
> only:
> {code:java}
> date
> Thu Mar 28 08:39:16 UTC 2024
> ozone admin container info 1004 --json
> {
>   "containerInfo" : {
>     "state" : "CLOSED",
>     "stateEnterTime" : "2024-03-27T18:43:51.934Z",
>     "replicationConfig" : {
>       "data" : 3,
>       "parity" : 2,
>       "ecChunkSize" : 1048576,
>       "codec" : "RS",
>       "requiredNodes" : 5,
>       "replicationType" : "EC"
>     },
>     "usedBytes" : 1342177280,
>     "numberOfKeys" : 5,
>     "lastUsed" : "2024-03-28T08:39:24.535189Z",
>     "owner" : "om1",
>     "containerID" : 1004,
>     "deleteTransactionId" : 0,
>     "sequenceId" : 0,
>     "deleted" : false,
>     "open" : false
>   },
>   "pipeline" : {
>     "id" : {
>       "id" : "73532c14-40ac-4924-9353-2f18ab0d63f2"
>     },
>     "replicationConfig" : {
>       "data" : 3,
>       "parity" : 2,
>       "ecChunkSize" : 1048576,
>       "codec" : "RS",
>       "requiredNodes" : 5,
>       "replicationType" : "EC"
>     },
>     "nodesInOrder" : [ {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "ipAddress" : "10.140.37.12",
>       "hostName" : "DN-5",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -662262523,
>       "networkLocation" : "/default",
>       "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
>       "numOfLeaves" : 1
>     }, {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "ipAddress" : "10.140.40.9",
>       "hostName" : "DN-1",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -1387859873,
>       "networkLocation" : "/default",
>       "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "numOfLeaves" : 1
>     }, {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "ipAddress" : "10.140.137.128",
>       "hostName" : "DN-8",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : 1098159392,
>       "networkLocation" : "/default",
>       "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "numOfLeaves" : 1
>     } ],
>     "creationTimestamp" : "2024-03-28T08:39:24.480Z",
>     "stateEnterTime" : "2024-03-28T08:39:24.545517Z",
>     "leaderNode" : {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "ipAddress" : "10.140.37.12",
>       "hostName" : "DN-5",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -662262523,
>       "networkLocation" : "/default",
>       "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
>       "numOfLeaves" : 1
>     },
>     "firstNode" : {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "ipAddress" : "10.140.37.12",
>       "hostName" : "DN-5",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -662262523,
>       "networkLocation" : "/default",
>       "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
>       "numOfLeaves" : 1
>     },
>     "closestNode" : {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "ipAddress" : "10.140.37.12",
>       "hostName" : "DN-5",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -662262523,
>       "networkLocation" : "/default",
>       "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
>       "numOfLeaves" : 1
>     },
>     "allocationTimeout" : false,
>     "healthy" : true,
>     "pipelineState" : "ALLOCATED",
>     "nodes" : [ {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "ipAddress" : "10.140.37.12",
>       "hostName" : "DN-5",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -662262523,
>       "networkLocation" : "/default",
>       "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
>       "numOfLeaves" : 1
>     }, {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "ipAddress" : "10.140.40.9",
>       "hostName" : "DN-1",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -1387859873,
>       "networkLocation" : "/default",
>       "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "numOfLeaves" : 1
>     }, {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "ipAddress" : "10.140.137.128",
>       "hostName" : "DN-8",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : 1098159392,
>       "networkLocation" : "/default",
>       "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "numOfLeaves" : 1
>     } ],
>     "empty" : false,
>     "type" : "EC"
>   },
>   "replicas" : [ {
>     "containerID" : 1004,
>     "state" : "CLOSED",
>     "datanodeDetails" : {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "ipAddress" : "10.140.37.12",
>       "hostName" : "DN-5z",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -662262523,
>       "networkLocation" : "/default",
>       "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>       "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
>       "numOfLeaves" : 1
>     },
>     "placeOfBirth" : "6179347f-5824-41d4-b722-f1dbc5f14880",
>     "sequenceId" : 0,
>     "keyCount" : 5,
>     "bytesUsed" : 1342177280,
>     "replicaIndex" : 2
>   }, {
>     "containerID" : 1004,
>     "state" : "CLOSED",
>     "datanodeDetails" : {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "ipAddress" : "10.140.40.9",
>       "hostName" : "DN-1",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : -1387859873,
>       "networkLocation" : "/default",
>       "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>       "numOfLeaves" : 1
>     },
>     "placeOfBirth" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
>     "sequenceId" : 0,
>     "keyCount" : 5,
>     "bytesUsed" : 1342177280,
>     "replicaIndex" : 3
>   }, {
>     "containerID" : 1004,
>     "state" : "CLOSED",
>     "datanodeDetails" : {
>       "level" : 0,
>       "cost" : 0,
>       "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "ipAddress" : "10.140.137.128",
>       "hostName" : "DN-8",
>       "ports" : [ {
>         "name" : "HTTPS",
>         "value" : 9883
>       }, {
>         "name" : "CLIENT_RPC",
>         "value" : 9864
>       }, {
>         "name" : "REPLICATION",
>         "value" : 9886
>       }, {
>         "name" : "RATIS",
>         "value" : 9858
>       }, {
>         "name" : "RATIS_ADMIN",
>         "value" : 9857
>       }, {
>         "name" : "RATIS_SERVER",
>         "value" : 9856
>       }, {
>         "name" : "STANDALONE",
>         "value" : 9859
>       } ],
>       "setupTime" : 0,
>       "persistedOpState" : "IN_SERVICE",
>       "persistedOpStateExpiryEpochSec" : 0,
>       "initialVersion" : 0,
>       "currentVersion" : 1,
>       "decommissioned" : false,
>       "maintenance" : false,
>       "signature" : 1098159392,
>       "networkLocation" : "/default",
>       "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
>       "numOfLeaves" : 1
>     },
>     "placeOfBirth" : "711656cf-a99e-4b2c-8c35-f015ee94889c",
>     "sequenceId" : 0,
>     "keyCount" : 5,
>     "bytesUsed" : 1342177280,
>     "replicaIndex" : 1
>   } ]
> } {code}
> Checked the SCM Logs, it is still sending reconstructECContainersCommand, 
> {code:java}
> 2024-03-28 08:36:56,748 INFO [Under Replicated 
> Processor]-org.apache.hadoop.hdds.scm.container.replication.ReplicationManager:
>  Sending command [reconstructECContainersCommand: containerID: 1004, 
> replicationConfig: EC{rs-3-2-1024k}, sources: 
> [ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128) replicaIndex: 1, 
> 6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12) replicaIndex: 2, 
> d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9) replicaIndex: 3], 
> targets: [572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130), 
> 711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)], missingIndexes: 
> [4, 5]] for container ContainerInfo{id=#1004, state=CLOSED, 
> stateEnterTime=2024-03-27T18:43:51.934Z, 
> pipelineID=PipelineID=53f5587f-9e6c-465d-a0cb-b82d10c227d3, owner=om1} to 
> 572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130) with datanode 
> deadline 1711615886747 and scm deadline 1711615916747 {code}
> Checked one of the Target DN DN-7, its throwing below warnings.
> {code:java}
> 2024-03-28 08:37:14,982 WARN 
> [ContainerReplicationThread-5]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
>  FAILED reconstructECContainersCommand: containerID=1004, 
> replication=rs-3-2-1024k, missingIndexes=[4, 5], 
> sources={1=ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128), 
> 2=6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12), 
> 3=d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9)}, 
> targets={4=572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130), 
> 5=711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)} after 10639 ms
> java.io.IOException: None of the block data have checksum which means 
> 2(parity)+1 blocks are not present
>         at 
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:156)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:325)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:171)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
>         at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> 2024-03-28 08:37:14,982 WARN 
> [ContainerReplicationThread-5]-org.apache.hadoop.ozone.container.replication.ReplicationSupervisor:
>  Failed FAILED reconstructECContainersCommand: containerID=1004, 
> replication=rs-3-2-1024k, missingIndexes=[4, 5], 
> sources={1=ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128), 
> 2=6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12), 
> 3=d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9)}, 
> targets={4=572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130), 
> 5=711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)} {code}
> *Expected Behavior:* Reconstruction should have happened 
> Note: This is fairly reproducible everytime.
> cc: [~siddhant] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to