[
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049962#comment-15049962
]
GAO Rui commented on HDFS-7661:
-------------------------------
[~szetszwo], [~jingzhao], thank you very much for the enlightening discussion
in the video meeting. I have walked through EC file reading part source codes.
In DFSInputStream#getFileLength():
{code}
public long getFileLength() {
synchronized(infoLock) {
return locatedBlocks == null? 0:
locatedBlocks.getFileLength() + lastBlockBeingWrittenLength;
}
}
{code}
I have three questions.
The first one, for a being written EC file, we should make
{{locatedBlocks.getFileLength()}} cover to the last completed block group,
right?
The second questions about {{lastBlockBeingWrittenLength}}.
I think for EC files, {{lastBlockBeingWrittenLength}} should be incremented to
the last completed written stripe. By completed written stripe(in R-S-6-3), I
refer to the stripe which has all internal cells(6 data cells and 3 parity
cells) written. According to the current writing part code. StripedDataStreamer
wait for acks when a stripe has all internal data cells full and parity cells
calculated. So, it is OK to keep incrementing {{lastBlockBeingWrittenLength}}
to the last completed written strip. Does it make sense to you?
The last question is about updating {{lastBlockBeingWrittenLength}} when
hflush/hsync is invoked. I would upload an document and try to cover all
possible scenarios in the document.
I have tried to trace {{lastBlockBeingWrittenLength}}, and found out that we
get the value of {{lastBlockBeingWrittenLength}} form the datanode side by
ReplicaBeingWritten#getVisibleLength():
{code}
@Override
public long getVisibleLength() {
return getBytesAcked(); // all acked bytes are visible
}
{code}
For EC files, it’s not appropriate to just take BytesAcked as visible length,
in the scenarios with flush/sync involved. I would over ride this method in the
document, too.
> Support read when a EC file is being written
> --------------------------------------------
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Tsz Wo Nicholas Sze
> Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png,
> HDFS-7661-unitTest-wip-trunk.patch
>
>
> We also need to support hflush/hsync and visible length.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)