[ 
https://issues.apache.org/jira/browse/HDFS-17663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17663:
-------------------------------
    Description: 
When I define and use the RS-14-2-1024k/RS-13-3-1024k/RS-12-4-1024k EC policy, 
I find that files are corrupted when decoded with ISAL. The files can be 
retrieved and have the same size as the original, but their MD5 checksums 
differ.

To verify that the encoding is correct, I perform a cross-validation: I encode 
the file both with and without ISAL, then decode it with and without ISAL. This 
results in four files, and only the two files decoded without ISAL have same 
MD5 checksums compared to the original. The picture shows the result:

!12-4-test.png|width=550,height=333!

 

For RS-15-1-1024k, there are issues with both encoding and decoding. Only the 
file encoded and decoded without ISAL has the same MD5 checksum as the 
original. The picture shows the result:
!15-1-test.png|width=550,height=333!

The test files decoded are as below:
  !files.png|width=550,height=333!

When the number of data blocks is fewer than 12, this issue does not occur. 
Additionally, the number of parity blocks does not seem to affect the issue. 
RS-12-4-1024k, RS-12-3-1024k, and RS-12-2-1024k all exhibit the problem.

  was:
When I define and use the RS-14-2-1024k/RS-13-3-1024k/RS-12-4-1024k EC policy, 
I find that files are corrupted when decoded with ISAL. The files can be 
retrieved and have the same size as the original, but their MD5 checksums 
differ.

To verify that the encoding is correct, I perform a cross-validation: I encode 
the file both with and without ISAL, then decode it with and without ISAL. This 
results in four files, and only the two files decoded with ISAL have different 
MD5 checksums compared to the original. The picture shows the result:

!12-4-test.png|width=550,height=333!

 

For RS-15-1-1024k, there are issues with both encoding and decoding. Only the 
file encoded and decoded without ISAL has the same MD5 checksum as the 
original. The picture shows the result:
!15-1-test.png|width=550,height=333!

The test files decoded are as below:
  !files.png|width=550,height=333!

When the number of data blocks is fewer than 12, this issue does not occur. 
Additionally, the number of parity blocks does not seem to affect the issue. 
RS-12-4-1024k, RS-12-3-1024k, and RS-12-2-1024k all exhibit the problem.


> File may be corrupted when using high-data-block EC policy with ISAL.
> ---------------------------------------------------------------------
>
>                 Key: HDFS-17663
>                 URL: https://issues.apache.org/jira/browse/HDFS-17663
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec, erasure-coding, native
>            Reporter: WangYuanben
>            Priority: Major
>         Attachments: 12-4-test.png, 15-1-test.png, files.png
>
>
> When I define and use the RS-14-2-1024k/RS-13-3-1024k/RS-12-4-1024k EC 
> policy, I find that files are corrupted when decoded with ISAL. The files can 
> be retrieved and have the same size as the original, but their MD5 checksums 
> differ.
> To verify that the encoding is correct, I perform a cross-validation: I 
> encode the file both with and without ISAL, then decode it with and without 
> ISAL. This results in four files, and only the two files decoded without ISAL 
> have same MD5 checksums compared to the original. The picture shows the 
> result:
> !12-4-test.png|width=550,height=333!
>  
> For RS-15-1-1024k, there are issues with both encoding and decoding. Only the 
> file encoded and decoded without ISAL has the same MD5 checksum as the 
> original. The picture shows the result:
> !15-1-test.png|width=550,height=333!
> The test files decoded are as below:
>   !files.png|width=550,height=333!
> When the number of data blocks is fewer than 12, this issue does not occur. 
> Additionally, the number of parity blocks does not seem to affect the issue. 
> RS-12-4-1024k, RS-12-3-1024k, and RS-12-2-1024k all exhibit the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to