[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum

ASF GitHub Bot (Jira) Sun, 06 Oct 2019 01:21:16 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324024
 ]


ASF GitHub Bot logged work on HDDS-2259:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Oct/19 08:20
            Start Date: 06/Oct/19 08:20
    Worklog Time Spent: 10m 
      Work Description: adoroszlai commented on pull request #1605: HDDS-2259. 
Container Data Scrubber computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605
 
 
   ## What changes were proposed in this pull request?
   
   Compute checksum in container scrubber only for the actual length of data 
read.  Otherwise, if the actual chunk size is not an integer multiple of the 
number of bytes per checksum (ie. buffer size), leftover data in the buffer 
results in wrong checksum and unhealthy containers.
   
   ```
   Corruption detected in container: [1] Exception: [Inconsistent read for 
chunk=102914246583189504_chunk_1 len=671 expected checksum [0, 0, 0, 0, -14, 
-102, -99, -51] actual checksum [0, 0, 0, 0, 23, -23, 53, -79] for block conID: 
1 locID: 102914246583189504 bcsId: 3]
   ```
   
   https://issues.apache.org/jira/browse/HDDS-2259
   
   ## How was this patch tested?
   
   1. Changed unit test to reproduce the problem by making sure that "bytes per 
checksum" and "chunk size" are different.
   2. Tested manually
      1. Created and closed containers with small (<1KB), medium (~7MB) and 
large (100MB) files.
      2. Verified that container scanner does not mark any of these unhealthy.
      3. Appended some garbage data to one of the chunk files.
      4. Verified that container scanner marks the corrupted container as 
unhealthy.
   
   ```
   ozone sh volume create vol1
   ozone sh bucket create vol1/bucket1
   ozone sh key put vol1/bucket1/small /etc/passwd
   ozone scmcli container close 1
   ozone sh key put vol1/bucket1/medium 
/opt/hadoop/share/ozone/lib/hadoop-hdfs-client-3.2.0.jar
   ozone scmcli container close 2
   ozone sh key put vol1/bucket1/large 
/opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar
   ozone scmcli container close 3
   # later
   echo asdfasdf >> /data/hdds/hdds/*/current/containerDir0/2/chunks/*_chunk_1 
   ```
   
   Log:
   
   ```
   Completed an iteration of container data scrubber in 1 minutes. Number of 
iterations (since the data-node restart) : 16, Number of containers scanned in 
this iteration : 3, Number of unhealthy containers found in this iteration : 0
   ...
   Corruption detected in container: [2] Exception: [Inconsistent read for 
chunk=102914295727980545_chunk_1 len=5023516 expected checksum [0, 0, 0, 0, 21, 
105, -33, 7] actual checksum [0, 0, 0, 0, -103, -121, 23, -96] for block conID: 
2 locID: 102914295727980545 bcsId: 9]
   Completed an iteration of container data scrubber in 1 minutes. Number of 
iterations (since the data-node restart) : 19, Number of containers scanned in 
this iteration : 3, Number of unhealthy containers found in this iteration : 1
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 324024)
    Remaining Estimate: 0h
            Time Spent: 10m

> Container Data Scrubber computes wrong checksum
> -----------------------------------------------
>
>                 Key: HDDS-2259
>                 URL: https://issues.apache.org/jira/browse/HDDS-2259
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Datanode
>    Affects Versions: 0.5.0
>            Reporter: Attila Doroszlai
>            Assignee: Attila Doroszlai
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Chunk checksum verification fails for (almost) any file.  This is caused by 
> computing checksum for the entire buffer, regardless of the actual size of 
> the chunk.
> {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273}
>             byte[] buffer = new byte[cData.getBytesPerChecksum()];
> ...
>                 v = fs.read(buffer);
> ...
>                 bytesRead += v;
> ...
>                 ByteString actual = cal.computeChecksum(buffer)
>                     .getChecksums().get(0);
> {code}
> This results in marking all closed containers as unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum

Reply via email to