[
https://issues.apache.org/jira/browse/HDFS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439549#comment-13439549
]
Kihwal Lee commented on HDFS-3177:
----------------------------------
bq. For append, it makes a lot of sense to keep using the existing checksum
type. What is the use case for using a different checksum type?
I don't think it makes sense either, but that was the design decision made in
HDFS-2130. There might have been some use cases for this, so I tried to support
it while making the default to not allow it. If you feel that this should be
the behavior with no configurable option, I will be happy to update the patch
accordingly.
What do you think we should do for concat()? It is supposed to be quick
namenode only operation, so I don't feel comfortable about inserting code to
check the checksums of input files.
bq. Suppose the last block is half written with CRC32 in a close file. Then,
the file is re-opened for append with CRC32C. Would the block has two checksum
types, i.e. first half is CRC32 and the second half is CRC32C?
No. Datanode will continue to use the same checksum parameters of the existing
partial block for writing, independent of what client is sending with data.
Input data integrity check is still done, of course.
bq. Suppose a close file is already using more than one checksum type. Then,
the file is re-opened for append with
dfs.client.append.allow-different-checksum == false. Which checksum should it
use? Or should it fail?
I don't think we can do much for existing files. Users can detect it with
getFileChecksum(), which will show DataChecksum.Type.MIXED as its checksum
type. For these files, checksum will still be used for block -level integrity
check and nothing will break until something like distcp tries to compare
FileChecksums after copying.
> Allow DFSClient to find out and use the CRC type being used for a file.
> -----------------------------------------------------------------------
>
> Key: HDFS-3177
> URL: https://issues.apache.org/jira/browse/HDFS-3177
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node, hdfs client
> Affects Versions: 0.23.0
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Fix For: 2.1.0-alpha, 3.0.0
>
> Attachments: hdfs-3177-after-hadoop-8239-8240.patch.txt,
> hdfs-3177-after-hadoop-8239.patch.txt, hdfs-3177-branch2-trunk.patch.txt,
> hdfs-3177.patch, hdfs-3177-with-hadoop-8239-8240.patch.txt,
> hdfs-3177-with-hadoop-8239-8240.patch.txt,
> hdfs-3177-with-hadoop-8239-8240.patch.txt,
> hdfs-3177-with-hadoop-8239.patch.txt
>
>
> To support HADOOP-8060, DFSClient should be able to find out the checksum
> type being used for files in hdfs.
> In my prototype, DataTransferProtocol was extended to include the checksum
> type in the blockChecksum() response. DFSClient uses it in getFileChecksum()
> to determin the checksum type. Also append() can be configured to use the
> existing checksum type instead of the configured one.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira