[jira] [Commented] (HDFS-7661) Support read when a EC file is being written

GAO Rui (JIRA) Wed, 09 Dec 2015 19:25:10 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049962#comment-15049962
 ]


GAO Rui commented on HDFS-7661:
-------------------------------

[~szetszwo], [~jingzhao], thank you very much for the enlightening discussion 
in the video meeting. I have walked through EC file reading part source codes. 

In DFSInputStream#getFileLength():
{code}
 public long getFileLength() {
  synchronized(infoLock) {
    return locatedBlocks == null? 0:
        locatedBlocks.getFileLength() + lastBlockBeingWrittenLength;
  }
}
{code}

I have three questions.
The first one, for a being written EC file,  we should make 
{{locatedBlocks.getFileLength()}} cover to the last completed block group, 
right? 

The second questions about {{lastBlockBeingWrittenLength}}. 
I think for EC files, {{lastBlockBeingWrittenLength}} should be incremented to 
the last completed written stripe. By completed written stripe(in R-S-6-3), I 
refer to the stripe which has all internal cells(6 data cells and 3 parity 
cells) written. According to the current writing part code. StripedDataStreamer 
wait for acks when a stripe has all internal data cells full and parity cells 
calculated. So, it is OK to keep incrementing {{lastBlockBeingWrittenLength}} 
to the last completed written strip. Does it make sense to you?

The last question is about updating {{lastBlockBeingWrittenLength}} when 
hflush/hsync is invoked. I would upload an document and try to cover all 
possible scenarios in the document.

 I have tried to trace {{lastBlockBeingWrittenLength}}, and found out that we 
get the value of {{lastBlockBeingWrittenLength}} form the datanode side by 
ReplicaBeingWritten#getVisibleLength():
{code}
@Override
 public long getVisibleLength() {
   return getBytesAcked(); // all acked bytes are visible
 } 
{code}

For EC files, it’s not appropriate to just take BytesAcked as visible length, 
in the scenarios with flush/sync involved. I would over ride this method in the 
document, too.


> Support read when a EC file is being written
> --------------------------------------------
>
>                 Key: HDFS-7661
>                 URL: https://issues.apache.org/jira/browse/HDFS-7661
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: GAO Rui
>         Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7661) Support read when a EC file is being written

Reply via email to