[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156653#comment-15156653
 ] 

GAO Rui commented on HDFS-7661:
-------------------------------

[~drankye], I have been doing some investigation about how hbase using 
hflush/hsync. According to this [slider | 
http://www.slideshare.net/enissoz/hbase-and-hdfs-understanding-filesystem-usage]
 shared by [~enis] : In page 12 and 15, for hbase, {{hflush()}} is used for 
writing WAL(Write Ahead Logs), WAL sync( hflush() ) hundreds of times per sec.  
So, I think create new bg for each flush call is not an practical option then.  

Maybe we could continue to discuss the previous option way:
{quote}
I was planing to truncate both the overwritten data in the end of both data 
file and .meta file in parity datanode, then store the overwritten data in the 
end of .meta file. One possible way to keep the data before first flush safe 
even if the second flush fails, maybe we could add {{upgrade/rollback}} 
mechanism of {{DataStorage}} alike method to data/checksum file of parity 
datanodes.
{quote}

Though, if the dns failure cause writing process to fail. We can not guarantee 
the data safety before first flush. Even in Replica file, we flush at some time 
and then continue to write file to 3dns. If we flush again in the same block, 
the write process is failed by dns failures, we either could not guarantee the 
data safety before the first flush I think.  [~walter.k.su], is this make sense 
to you? 

Based on {{upgrade/rollback}} mechanism of data/checksum file of parity 
datanodes, we could recovery data before the first flush only in scenarios like 
bellow:
  1.first flush successes 
  2.parity dn0 dies 
  3.data dn4,dn5 and parity dn1 failed during second flush, but parity dn2 
success
At, this time of point, if parity dn0 comes back, we could roll back dn2 to the 
status before second flush. 
This might be the only kind of scenario using  {{upgrade/rollback}} mechanism 
of data/checksum file of parity datanodes.
Guys, do we need to implement  {{upgrade/rollback}} mechanism for this kind of 
scenarios?

[~liuml07], [~jingzhao], for the data consistency issue. If we do not implement 
a lock in NN, maybe we could make the read client to check bg data length in 
the .meta files of 3 parity dns to check if they are in the same version like 
[~zhz] suggested. But, if the read client find the bg data lengths are 
different, the read client could try to read the .mata file again against the 
less bg data length parity dns.  But, the bg data length could change several 
times, maybe the read client could not get a consistence bg data length all the 
time.   Am I missing something?



> Erasure coding: support hflush and hsync
> ----------------------------------------
>
>                 Key: HDFS-7661
>                 URL: https://issues.apache.org/jira/browse/HDFS-7661
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: GAO Rui
>         Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, 
> HDFS-EC-file-flush-sync-design-version2.0.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to