adoroszlai opened a new pull request #1605: HDDS-2259. Container Data Scrubber 
computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605
 
 
   ## What changes were proposed in this pull request?
   
   Compute checksum in container scrubber only for the actual length of data 
read.  Otherwise, if the actual chunk size is not an integer multiple of the 
number of bytes per checksum (ie. buffer size), leftover data in the buffer 
results in wrong checksum and unhealthy containers.
   
   ```
   Corruption detected in container: [1] Exception: [Inconsistent read for 
chunk=102914246583189504_chunk_1 len=671 expected checksum [0, 0, 0, 0, -14, 
-102, -99, -51] actual checksum [0, 0, 0, 0, 23, -23, 53, -79] for block conID: 
1 locID: 102914246583189504 bcsId: 3]
   ```
   
   https://issues.apache.org/jira/browse/HDDS-2259
   
   ## How was this patch tested?
   
   1. Changed unit test to reproduce the problem by making sure that "bytes per 
checksum" and "chunk size" are different.
   2. Tested manually
      1. Created and closed containers with small (<1KB), medium (~7MB) and 
large (100MB) files.
      2. Verified that container scanner does not mark any of these unhealthy.
      3. Appended some garbage data to one of the chunk files.
      4. Verified that container scanner marks the corrupted container as 
unhealthy.
   
   ```
   ozone sh volume create vol1
   ozone sh bucket create vol1/bucket1
   ozone sh key put vol1/bucket1/small /etc/passwd
   ozone scmcli container close 1
   ozone sh key put vol1/bucket1/medium 
/opt/hadoop/share/ozone/lib/hadoop-hdfs-client-3.2.0.jar
   ozone scmcli container close 2
   ozone sh key put vol1/bucket1/large 
/opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar
   ozone scmcli container close 3
   # later
   echo asdfasdf >> /data/hdds/hdds/*/current/containerDir0/2/chunks/*_chunk_1 
   ```
   
   Log:
   
   ```
   Completed an iteration of container data scrubber in 1 minutes. Number of 
iterations (since the data-node restart) : 16, Number of containers scanned in 
this iteration : 3, Number of unhealthy containers found in this iteration : 0
   ...
   Corruption detected in container: [2] Exception: [Inconsistent read for 
chunk=102914295727980545_chunk_1 len=5023516 expected checksum [0, 0, 0, 0, 21, 
105, -33, 7] actual checksum [0, 0, 0, 0, -103, -121, 23, -96] for block conID: 
2 locID: 102914295727980545 bcsId: 9]
   Completed an iteration of container data scrubber in 1 minutes. Number of 
iterations (since the data-node restart) : 19, Number of containers scanned in 
this iteration : 3, Number of unhealthy containers found in this iteration : 1
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to