[
https://issues.apache.org/jira/browse/RATIS-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze resolved RATIS-2147.
-------------------------------
Fix Version/s: 3.1.1
Resolution: Fixed
The pull request was merged. Thanks, [~tohsakarin__]!
> MD5 mismatch when accept snapshot
> ---------------------------------
>
> Key: RATIS-2147
> URL: https://issues.apache.org/jira/browse/RATIS-2147
> Project: Ratis
> Issue Type: Bug
> Components: snapshot
> Affects Versions: 3.1.0, 3.2.0
> Reporter: yuuka
> Assignee: yuuka
> Priority: Major
> Fix For: 3.1.1
>
> Attachments: image-2024-09-03-10-35-08-315.png,
> image-2024-09-03-10-35-28-617.png
>
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> We encountered an MD5 mismatch issue in IoTDB, and after multiple
> investigations, we found that the digester was contaminated
>
> We have checked that it is not a network and disk problem
>
> In implementation, the received snapshot will be written to a temporary file
> first. If there is an md5 mismatch, we will read the data from this temporary
> file and use a new digest to calculate md5, but the result of this
> calculation is the same as the md5 hash value sent
> !image-2024-09-03-10-35-28-617.png!
>
> !image-2024-09-03-10-35-08-315.png!
>
>
> Use the saved corrupted file name to locate the relevant log, here to
> tlog.txt.snapshot.snapshot.as an example corrupt20240831-094107 _735
> !https://timechor.feishu.cn/space/api/box/stream/download/asynccode/?code=MDhjNDQ1OWY5NGVlM2YzYTEwOWE1ZWU5MDlmZjNmMmRfTHE1T3lFSnllTFR6Mm5Pc2oyQUpsWUxJTmM4SEhodVBfVG9rZW46RHJlbmJHQlRkb2daakp4RHZMVWNEOVFPbmhiXzE3MjUzODYwMzQ6MTcyNTM4OTYzNF9WNA!
> Before encountering corrupt, the sender sent several consecutive snapshot
> installation requests to the receiver.
>
> The receiver successfully received some requests, and then encountered a
> request for corrupt, and began printing "recompute again" to start
> recalculating.
>
> After execution, the ERROR log of the rename will be printed, and the data
> will be read from the file and compared with the received chunk data.
>
> If a byte does not match, the corresponding information will be printed, but
> no log information will be printed, which means that the content written to
> the disk is the same as the content sent
> !https://timechor.feishu.cn/space/api/box/stream/download/asynccode/?code=ZDQ3NmJhNWZiYjEyYjU1MWYxOGI3MTFjNjNjMjAyMmJfUnAwMjB5dloxODlGRG52RFdZUTBCSUc0NjBPaWc3VXdfVG9rZW46TUxFeGJxTjBqbzIxNUx4eUZrUGNHMk55bjhkXzE3MjUzODYwNjA6MTcyNTM4OTY2MF9WNA!
> This makes the problem very clear. There is a problem with the MD5
> calculation class, and the reasons are as follows:
>
> If a byte in the middle of the data part is incorrect due to network
> reasons, the calculated result and the hash sent must be different
>
> If there is a problem with the part that stores the hash value, the final
> calculation result will also be different.
>
> I suggest creating a new digest every time follower receive a snapshot, so as
> to avoid pollution problems. Under normal network and disk conditions,
> Corrupt will not occur
--
This message was sent by Atlassian Jira
(v8.20.10#820010)