[
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174860#comment-15174860
]
GAO Rui commented on HDFS-7661:
-------------------------------
Hi [~liuml07].
For the data consistency issue, I figured out an scenario under R-S-6-3 EC
policy:
0. Write client call flush, we have V1 parity in all of parity dns: DN0,DN1,DN2
1. Two IDB(internal data block) dns failed.
2. Read client read 4 IDBs, and V1 parity in DN0.
3. Write client call flush twice.
4. Read client read V3 parity in DN1.
5. Write client call flush twice again.
6. Read client read V5 parity in DN2.
7. Read client have only five internal blocks for all of V1,V3 and V5.
8. Read fail.
This is quite a extreme scenario, but still might could happen some time. Based
on current design, we have only the overwritten parity data for last one
version, and we do not have a lock, that could cause problem in data
consistency.
I think if a lock in NN is too heavy, maybe we could consider to maintain the
lock in write client. So the read client get file infos and write client info
from NN, then use the lock in write client to control the data consistency.
Ping [~szetszwo], [~jingzhao],[~zhz], [~drankye], [~walter.k.su] and [~ikki407]
for discussion :D
> Erasure coding: support hflush and hsync
> ----------------------------------------
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Tsz Wo Nicholas Sze
> Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png,
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch,
> HDFS-EC-file-flush-sync-design-version1.1.pdf,
> HDFS-EC-file-flush-sync-design-version2.0.pdf
>
>
> We also need to support hflush/hsync and visible length.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)