[ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860058#action_12860058
 ] 

Hairong Kuang commented on HDFS-1057:
-------------------------------------

In the trunk I think there is a more elegant way of solving the problem.

In each ReplcaBeingWritten, we could have two more fields to keep track of the 
last consistent state: replica length and the last chunk's crc. The last 
consistent state is defined as the replica length right after the most recent 
packet has been flushed to the disk. So when a read request comes in, if the 
last byte requested falls into a being-written last chunk, the datanode returns 
the last chunk upto the last consistent state. The crc is read from memory in 
stead of reading from the disk. 

This will also make the replica recovery problem filed in HDFS-1103 easier to 
solve.

> Concurrent readers hit ChecksumExceptions if following a writer to very end 
> of file
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-1057
>                 URL: https://issues.apache.org/jira/browse/HDFS-1057
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
> calling flush(). Therefore, if there is a concurrent reader, it's possible to 
> race here - the reader will see the new length while those bytes are still in 
> the buffers of BlockReceiver. Thus the client will potentially see checksum 
> errors or EOFs. Additionally, the last checksum chunk of the file is made 
> accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to