[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

Wei-Chiu Chuang (JIRA) Mon, 08 Jan 2018 12:07:29 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316936#comment-16316936
 ]


Wei-Chiu Chuang commented on HDFS-11187:
----------------------------------------

Thanks [~yzhangal] for the throughout review!

bq. that means we don't load checksum when replicaVisibleLength is CHUNK_SIZE, 
in which case it's possible checksum is not loaded (thus null), suggest to add 
a comment how that case is handled.

That's a good point. Please see the following two code comments in BlockSender 
constructor:
{code}
      // end is either last byte on disk or the length for which we have a 
      // checksum
      long end = chunkChecksum != null ? chunkChecksum.getDataLength()
          : replica.getBytesOnDisk();
{code}
So the null case is handled.

Also, since the end is a full chunk, the checksum does not change. A concurrent 
writer may append a new chunk and thus add a new checksum for the new chunk, 
but this reader is not supposed to read beyond the current end of replica, so 
won't be affected.
{code}
        if (tmpLen < end) {
          // will use on-disk checksum here since the end is a stable chunk
          end = tmpLen;
        }
{code}
The reader also reads metafile while reading data. If in-memory checksum is 
null, it uses checksum from metafile; otherwise it replaces checksum with the 
in-memory one.
{code:title=BlockSender#sendPacket}
      // write in progress that we need to use to get last checksum
      if (lastDataPacket && lastChunkChecksum != null) {
        int start = checksumOff + checksumDataLen - checksumSize;
        byte[] updatedChecksum = lastChunkChecksum.getChecksum();
        if (updatedChecksum != null) {
          System.arraycopy(updatedChecksum, 0, buf, start, checksumSize);
        }
      }
{code}

bq. There is a possible race condition that a finalized replica is moved to RBW 
then modified by another writer, and the reader is trying to access the same 
replica. We can discuss this situation separately.
Let's consider two cases where there are overlapping reads and writes for a 
finalized replica: 

# case 1, a reader starts reading before a writer but writer starts writing 
before the read ends.
in this case, the writer, has to convert the finalized replica object to rbw 
replica object by instantiating a ReplicaBeingWritten object and then copying 
from the FinalizedReplica object. So the update in the replica object does not 
affect the concurrent reader. The writer also needs to move both data and 
metadata files from finalized directory to rbw directory, but since the reader 
already opens the data/meta file, so the reader will be able to read them. 

# case 2, a writer starts writing before a reader.
in this case, the writer will move both data and metadata to rbw/ directory, so 
reader will not be able to open the file, and reader will abort; client will 
try replicas on other DNs. This is the same behavior as before.

> Optimize disk access for last partial chunk checksum of Finalized replica
> -------------------------------------------------------------------------
>
>                 Key: HDFS-11187
>                 URL: https://issues.apache.org/jira/browse/HDFS-11187
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-11187.001.patch, HDFS-11187.002.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

Reply via email to