[
https://issues.apache.org/jira/browse/HDFS-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836880#comment-13836880
]
Jing Zhao commented on HDFS-5590:
---------------------------------
Currently when dfs.persist.blocks is false (its default value) and the HA is
not enabled, the getAdditionalBlock call will not call logSync. Even without
the sequential block ID mechanism, failing to persist the new block can still
cause data loss. Thus a quick fix here is to always call logSync for
getAdditionalBlock. But this may affect the performance.
Another possible fix is to make sure the next block id and generation stamp is
always larger than the max block id and gs in the system. Thus in
BlockManager#processFirstBlockReport, we can change the following code
{code}
// If block does not belong to any file, we are done.
if (storedBlock == null) continue;
{code}
to
{code}
if (storedBlock == null) {
// TODO: check the block id and generation stamp id of the reported
block, and increase the local latest block id and generation stamp if necessary
to make sure they are larger than the reported values
}
{code}
This can make sure we do not have overlap in block id and generation stamp. But
data loss is still possible.
> Sequential block ID may cause data loss when persistBlocks is set to false
> --------------------------------------------------------------------------
>
> Key: HDFS-5590
> URL: https://issues.apache.org/jira/browse/HDFS-5590
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.2.0
> Reporter: Jing Zhao
> Assignee: Jing Zhao
>
> In a cluster with non-HA setup and dfs.persist.blocks set to false, the
> current sequential block ID mechanism may cause data loss in the following
> case:
> # client creates file1 and requests a block from NN and get blk_id1_gs1
> # client writes blk_id1_gs1 to DN
> # NN is restarted and because persistBlocks is false, blk_id1_gs1 may not be
> persisted in disk
> # another client creates file2 and NN will allocate a new block using the
> same block id blk_id1_gs1 since block ID and generation stamp are both
> increased sequentially.
> Now we may have two versions (file1 and file2) of the blk_id1_gs1 (same id,
> same gs) in the system. It will case data loss.
--
This message was sent by Atlassian JIRA
(v6.1#6144)