[
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849701#comment-13849701
]
Eric Sirianni commented on HDFS-5434:
-------------------------------------
Thanks everyone for all the good technical feedback. A few comments...
bq. Server-side enforcement of pipeline size also seems somewhat more
complementary to where HDFS is headed with server-side tiered storage policies
(HDFS-4672).
bq. I am not sure what you mean here. Can you add more details?
My observation was that HDFS-4672 seems to imply/propose moving towards having
the client specifying the desired service level of a file in terms of the
_"what"_ (e.g. a class of storage/service) instead of the _"how"_ (replication
count, disk type, caching). With this backdrop, it seems reasonable for the
server to guarantee a certain degree of write resiliency (the "what")
independent of the replication count (the "how"). Perhaps I'm misinterpreting
where things are headed, but moving to a more "policy based" model seems like
step in the right direction.
bq. RAID setup without spare disks to rebuild the array could cause more data
loss.
bq. I feel that this looks more like a hack
Let's try to decouple the enhancement suggested in this JIRA from the
sensibility of running with replicaCount = 1. We probably shouldn't have
titled the JIRA the way we did.
At its core, we are proposing decoupling the replication factor from the
pipeline length. This does not seem like a hack to me. There are legitimate
uses cases for having them be controllable independently, as each attribute can
provide different protection and performance characteristics. For example:
* For the MapReduce job files, a short pipeline is wanted in tandem with a high
replication factor (to keep writes fast, but have many replicas available for
the job definition file). Currently this is implemented by a client side
workaround (which, incidentally, doesn't work quite right for files larger than
1 block (which probably never occurs for JAR files anyway)).
* In the repCount=1 use case, the client wants a low replication factor for the
file (for storage efficiency), but does not want ingest to fail due to a host
failure.
With this perspective, another potential fix for this JIRA would be to add an
optional {{pipelineSize}} parameter to {{ClientProtocol.create()}} and default
it to the same as the {{replication}}. Not sure there is appetite for this as
it would involve a protocol change, but I wanted to throw this idea out there
for consideration. Thoughts?
bq. Have you considered using BlockPlacementPolicy? We could give additional
state to BlockPlacementPolicy (whether new block allocation existing). From a
quick look, the client and NN will use any extra replicas returned by
BlockPlacementPolicy.
This looks promising and we will investigate this. Thanks Arpit!
> Write resiliency for replica count 1
> ------------------------------------
>
> Key: HDFS-5434
> URL: https://issues.apache.org/jira/browse/HDFS-5434
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.2.0
> Reporter: Buddy
> Priority: Minor
>
> If a file has a replica count of one, the HDFS client is exposed to write
> failures if the data node fails during a write. With a pipeline of size of
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data
> nodes instead of one. If one of the data nodes fails during the write, normal
> pipeline recovery will ensure that the write succeeds to the surviving data
> node.
> The existing code in the name node will prune the extra replica when it
> receives the block received reports for the finalized block from both data
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by
> ensuring that the pipeline size passed to
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)