[
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156653#comment-15156653
]
GAO Rui commented on HDFS-7661:
-------------------------------
[~drankye], I have been doing some investigation about how hbase using
hflush/hsync. According to this [slider |
http://www.slideshare.net/enissoz/hbase-and-hdfs-understanding-filesystem-usage]
shared by [~enis] : In page 12 and 15, for hbase, {{hflush()}} is used for
writing WAL(Write Ahead Logs), WAL sync( hflush() ) hundreds of times per sec.
So, I think create new bg for each flush call is not an practical option then.
Maybe we could continue to discuss the previous option way:
{quote}
I was planing to truncate both the overwritten data in the end of both data
file and .meta file in parity datanode, then store the overwritten data in the
end of .meta file. One possible way to keep the data before first flush safe
even if the second flush fails, maybe we could add {{upgrade/rollback}}
mechanism of {{DataStorage}} alike method to data/checksum file of parity
datanodes.
{quote}
Though, if the dns failure cause writing process to fail. We can not guarantee
the data safety before first flush. Even in Replica file, we flush at some time
and then continue to write file to 3dns. If we flush again in the same block,
the write process is failed by dns failures, we either could not guarantee the
data safety before the first flush I think. [~walter.k.su], is this make sense
to you?
Based on {{upgrade/rollback}} mechanism of data/checksum file of parity
datanodes, we could recovery data before the first flush only in scenarios like
bellow:
1.first flush successes
2.parity dn0 dies
3.data dn4,dn5 and parity dn1 failed during second flush, but parity dn2
success
At, this time of point, if parity dn0 comes back, we could roll back dn2 to the
status before second flush.
This might be the only kind of scenario using {{upgrade/rollback}} mechanism
of data/checksum file of parity datanodes.
Guys, do we need to implement {{upgrade/rollback}} mechanism for this kind of
scenarios?
[~liuml07], [~jingzhao], for the data consistency issue. If we do not implement
a lock in NN, maybe we could make the read client to check bg data length in
the .meta files of 3 parity dns to check if they are in the same version like
[~zhz] suggested. But, if the read client find the bg data lengths are
different, the read client could try to read the .mata file again against the
less bg data length parity dns. But, the bg data length could change several
times, maybe the read client could not get a consistence bg data length all the
time. Am I missing something?
> Erasure coding: support hflush and hsync
> ----------------------------------------
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Tsz Wo Nicholas Sze
> Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png,
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch,
> HDFS-EC-file-flush-sync-design-version1.1.pdf,
> HDFS-EC-file-flush-sync-design-version2.0.pdf
>
>
> We also need to support hflush/hsync and visible length.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)