[
https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324024
]
ASF GitHub Bot logged work on HDDS-2259:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Oct/19 08:20
Start Date: 06/Oct/19 08:20
Worklog Time Spent: 10m
Work Description: adoroszlai commented on pull request #1605: HDDS-2259.
Container Data Scrubber computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605
## What changes were proposed in this pull request?
Compute checksum in container scrubber only for the actual length of data
read. Otherwise, if the actual chunk size is not an integer multiple of the
number of bytes per checksum (ie. buffer size), leftover data in the buffer
results in wrong checksum and unhealthy containers.
```
Corruption detected in container: [1] Exception: [Inconsistent read for
chunk=102914246583189504_chunk_1 len=671 expected checksum [0, 0, 0, 0, -14,
-102, -99, -51] actual checksum [0, 0, 0, 0, 23, -23, 53, -79] for block conID:
1 locID: 102914246583189504 bcsId: 3]
```
https://issues.apache.org/jira/browse/HDDS-2259
## How was this patch tested?
1. Changed unit test to reproduce the problem by making sure that "bytes per
checksum" and "chunk size" are different.
2. Tested manually
1. Created and closed containers with small (<1KB), medium (~7MB) and
large (100MB) files.
2. Verified that container scanner does not mark any of these unhealthy.
3. Appended some garbage data to one of the chunk files.
4. Verified that container scanner marks the corrupted container as
unhealthy.
```
ozone sh volume create vol1
ozone sh bucket create vol1/bucket1
ozone sh key put vol1/bucket1/small /etc/passwd
ozone scmcli container close 1
ozone sh key put vol1/bucket1/medium
/opt/hadoop/share/ozone/lib/hadoop-hdfs-client-3.2.0.jar
ozone scmcli container close 2
ozone sh key put vol1/bucket1/large
/opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar
ozone scmcli container close 3
# later
echo asdfasdf >> /data/hdds/hdds/*/current/containerDir0/2/chunks/*_chunk_1
```
Log:
```
Completed an iteration of container data scrubber in 1 minutes. Number of
iterations (since the data-node restart) : 16, Number of containers scanned in
this iteration : 3, Number of unhealthy containers found in this iteration : 0
...
Corruption detected in container: [2] Exception: [Inconsistent read for
chunk=102914295727980545_chunk_1 len=5023516 expected checksum [0, 0, 0, 0, 21,
105, -33, 7] actual checksum [0, 0, 0, 0, -103, -121, 23, -96] for block conID:
2 locID: 102914295727980545 bcsId: 9]
Completed an iteration of container data scrubber in 1 minutes. Number of
iterations (since the data-node restart) : 19, Number of containers scanned in
this iteration : 3, Number of unhealthy containers found in this iteration : 1
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 324024)
Remaining Estimate: 0h
Time Spent: 10m
> Container Data Scrubber computes wrong checksum
> -----------------------------------------------
>
> Key: HDDS-2259
> URL: https://issues.apache.org/jira/browse/HDDS-2259
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: Ozone Datanode
> Affects Versions: 0.5.0
> Reporter: Attila Doroszlai
> Assignee: Attila Doroszlai
> Priority: Critical
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Chunk checksum verification fails for (almost) any file. This is caused by
> computing checksum for the entire buffer, regardless of the actual size of
> the chunk.
> {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273}
> byte[] buffer = new byte[cData.getBytesPerChecksum()];
> ...
> v = fs.read(buffer);
> ...
> bytesRead += v;
> ...
> ByteString actual = cal.computeChecksum(buffer)
> .getChecksums().get(0);
> {code}
> This results in marking all closed containers as unhealthy.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]