[
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316936#comment-16316936
]
Wei-Chiu Chuang commented on HDFS-11187:
----------------------------------------
Thanks [~yzhangal] for the throughout review!
bq. that means we don't load checksum when replicaVisibleLength is CHUNK_SIZE,
in which case it's possible checksum is not loaded (thus null), suggest to add
a comment how that case is handled.
That's a good point. Please see the following two code comments in BlockSender
constructor:
{code}
// end is either last byte on disk or the length for which we have a
// checksum
long end = chunkChecksum != null ? chunkChecksum.getDataLength()
: replica.getBytesOnDisk();
{code}
So the null case is handled.
Also, since the end is a full chunk, the checksum does not change. A concurrent
writer may append a new chunk and thus add a new checksum for the new chunk,
but this reader is not supposed to read beyond the current end of replica, so
won't be affected.
{code}
if (tmpLen < end) {
// will use on-disk checksum here since the end is a stable chunk
end = tmpLen;
}
{code}
The reader also reads metafile while reading data. If in-memory checksum is
null, it uses checksum from metafile; otherwise it replaces checksum with the
in-memory one.
{code:title=BlockSender#sendPacket}
// write in progress that we need to use to get last checksum
if (lastDataPacket && lastChunkChecksum != null) {
int start = checksumOff + checksumDataLen - checksumSize;
byte[] updatedChecksum = lastChunkChecksum.getChecksum();
if (updatedChecksum != null) {
System.arraycopy(updatedChecksum, 0, buf, start, checksumSize);
}
}
{code}
bq. There is a possible race condition that a finalized replica is moved to RBW
then modified by another writer, and the reader is trying to access the same
replica. We can discuss this situation separately.
Let's consider two cases where there are overlapping reads and writes for a
finalized replica:
# case 1, a reader starts reading before a writer but writer starts writing
before the read ends.
in this case, the writer, has to convert the finalized replica object to rbw
replica object by instantiating a ReplicaBeingWritten object and then copying
from the FinalizedReplica object. So the update in the replica object does not
affect the concurrent reader. The writer also needs to move both data and
metadata files from finalized directory to rbw directory, but since the reader
already opens the data/meta file, so the reader will be able to read them.
# case 2, a writer starts writing before a reader.
in this case, the writer will move both data and metadata to rbw/ directory, so
reader will not be able to open the file, and reader will abort; client will
try replicas on other DNs. This is the same behavior as before.
> Optimize disk access for last partial chunk checksum of Finalized replica
> -------------------------------------------------------------------------
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11187.001.patch, HDFS-11187.002.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the
> last partial chunk checksum from disk while holding FsDatasetImpl lock for
> every reader. It is possible to optimize this by keeping an up-to-date
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the
> state of in-memory checksum requires a lot more work.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]