[jira] [Commented] (HDFS-3177) Allow DFSClient to find out and use the CRC type being used for a file.

Kihwal Lee (JIRA) Wed, 22 Aug 2012 07:09:43 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439549#comment-13439549
 ]


Kihwal Lee commented on HDFS-3177:
----------------------------------


bq. For append, it makes a lot of sense to keep using the existing checksum 
type. What is the use case for using a different checksum type?

I don't think it makes sense either, but that was the design decision made in 
HDFS-2130. There might have been some use cases for this, so I tried to support 
it while making the default to not allow it. If you feel that this should be 
the behavior with no configurable option, I will be happy to update the patch 
accordingly.  

What do you think we should do for concat()? It is supposed to be quick 
namenode only operation, so I don't feel comfortable about inserting code to 
check the checksums of input files.

bq. Suppose the last block is half written with CRC32 in a close file. Then, 
the file is re-opened for append with CRC32C. Would the block has two checksum 
types, i.e. first half is CRC32 and the second half is CRC32C?

No. Datanode will continue to use the same checksum parameters of the existing 
partial block for writing, independent of what client is sending with data. 
Input data integrity check is still done, of course. 

bq. Suppose a close file is already using more than one checksum type. Then, 
the file is re-opened for append with 
dfs.client.append.allow-different-checksum == false. Which checksum should it 
use? Or should it fail?

I don't think we can do much for existing files. Users can detect it with 
getFileChecksum(), which will show DataChecksum.Type.MIXED as its checksum 
type. For these files, checksum will still be used for block -level integrity 
check and nothing will break until something like distcp tries to compare 
FileChecksums after copying.  
                
> Allow DFSClient to find out and use the CRC type being used for a file.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3177
>                 URL: https://issues.apache.org/jira/browse/HDFS-3177
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 0.23.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 2.1.0-alpha, 3.0.0
>
>         Attachments: hdfs-3177-after-hadoop-8239-8240.patch.txt, 
> hdfs-3177-after-hadoop-8239.patch.txt, hdfs-3177-branch2-trunk.patch.txt, 
> hdfs-3177.patch, hdfs-3177-with-hadoop-8239-8240.patch.txt, 
> hdfs-3177-with-hadoop-8239-8240.patch.txt, 
> hdfs-3177-with-hadoop-8239-8240.patch.txt, 
> hdfs-3177-with-hadoop-8239.patch.txt
>
>
> To support HADOOP-8060, DFSClient should be able to find out the checksum 
> type being used for files in hdfs.
> In my prototype, DataTransferProtocol was extended to include the checksum 
> type in the blockChecksum() response. DFSClient uses it in getFileChecksum() 
> to determin the checksum type. Also append() can be configured to use the 
> existing checksum type instead of the configured one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3177) Allow DFSClient to find out and use the CRC type being used for a file.

Reply via email to