Kodali Bhavya Sree created HDDS-14697:
-----------------------------------------

             Summary: Data integrity is missing for the snapshot after 
de-commissioning Leader OM node. Below are the steps followed:
                 Key: HDDS-14697
                 URL: https://issues.apache.org/jira/browse/HDDS-14697
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Manager
    Affects Versions: 2.0.0
            Reporter: Kodali Bhavya Sree


Data integrity is missing for the snapshot after de-commissioning Leader OM 
node. Below are the steps followed:
{code:java}
1. Create a volume and bucket with different params. Generate keys over the 
bucket.
2. Calculate checksum of all the files.
3. Create two snapshot, delete one of them
4. Decommission  a leader OM node
5. Validate the checksums of the file.
6. Create a snapshots after decommissioning and calculate snapdiff with the 
snapshot (created) before decommissioning{code}
Checksum validation of snapshots after decommissioning is failing.
Failing after below step → Copying all objects under snapshot {{snap-j0vp8}} of 
that bucket into
 
 {{}}
{code:java}
2026-02-22 09:46:51,203|INFO|MainThread|machine.py:190 - 
run()||GUID=c7d2cbe1-f9e1-4074-a0fe-00b4d295bbbe|RUNNING: ssh -l root -i 
/tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o 
UserKnownHostsFile=/dev/null quasar-nntvzt-3.vpc.cloudera.com "export 
KRB5CCNAME=/hwqe/hadoopqe/artifacts/kerberosTickets/hrt_qa.kerberos.ticket; 
export OZONE_LOGLEVEL=INFO;/opt/cloudera/parcels/CDH/bin/ozone fs -get 
ofs://ozone1771736427/vol-test-workload-om-decommission-recommission-1771752191/buck-test-workload-om-decommission-recommission-1771752191/.snapshot/snap-j0vp8/*
 /test_master_node_decommissioning_om_workload/workload_local1771752225"{code}
{{

}}Below Null pointer exception is seen
{code:java}
2026-02-22 09:46:53,571|INFO|MainThread|machine.py:205 - 
run()||GUID=c7d2cbe1-f9e1-4074-a0fe-00b4d295bbbe|26/02/22 01:46:53 INFO 
retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc_.RemoteException(java.lang.IllegalStateException): 
java.lang.NullPointerException: Cannot invoke 
"org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$SnapshotVersionsMeta.getVersion()"
 because the return value of 
"org.apache.hadoop.ozone.om.snapshot.OmSnapshotLocalDataManager$ReadableOmSnapshotLocalDataMetaProvider.getMeta()"
 is nul{code}
 

where definition of getMeta() is below:
 
{code:java}
public synchronized SnapshotVersionsMeta getMeta() throws IOException {
      if (closed) {
        throw new IOException("Resource has already been closed.");
      }
      return meta;
    }
{code}
 

Upstream PR where these changes went in:
[+https://github.infra.cloudera.com/CDH/ozone/commit/1dda9abfa979f3282d3355e154ce025f76d48665+]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to