[ 
https://issues.apache.org/jira/browse/HDFS-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849701#comment-13849701
 ] 

Eric Sirianni commented on HDFS-5434:
-------------------------------------

Thanks everyone for all the good technical feedback.  A few comments...

bq. Server-side enforcement of pipeline size also seems somewhat more 
complementary to where HDFS is headed with server-side tiered storage policies 
(HDFS-4672).
bq. I am not sure what you mean here. Can you add more details?

My observation was that HDFS-4672 seems to imply/propose moving towards having 
the client specifying the desired service level of a file in terms of the 
_"what"_ (e.g. a  class of storage/service) instead of the _"how"_ (replication 
count, disk type, caching).  With this backdrop, it seems reasonable for the 
server to guarantee a certain degree of write resiliency (the "what") 
independent of the replication count (the "how").  Perhaps I'm misinterpreting 
where things are headed, but moving to a more "policy based" model seems like 
step in the right direction.

bq. RAID setup without spare disks to rebuild the array could cause more data 
loss.
bq. I feel that this looks more like a hack
Let's try to decouple the enhancement suggested in this JIRA from the 
sensibility of running with replicaCount = 1.  We probably shouldn't have 
titled the JIRA the way we did.

At its core, we are proposing decoupling the replication factor from the 
pipeline length.  This does not seem like a hack to me.  There are legitimate 
uses cases for having them be controllable independently, as each attribute can 
provide different protection and performance characteristics.  For example:
* For the MapReduce job files, a short pipeline is wanted in tandem with a high 
replication factor (to keep writes fast, but have many replicas available for 
the job definition file).  Currently this is implemented by a client side 
workaround (which, incidentally, doesn't work quite right for files larger than 
1 block (which probably never occurs for JAR files anyway)).
* In the repCount=1 use case, the client wants a low replication factor for the 
file (for storage efficiency), but does not want ingest to fail due to a host 
failure.

With this perspective, another potential fix for this JIRA would be to add an 
optional {{pipelineSize}} parameter to {{ClientProtocol.create()}} and default 
it to the same as the {{replication}}.  Not sure there is appetite for this as 
it would involve a protocol change, but I wanted to throw this idea out there 
for consideration.  Thoughts?

bq. Have you considered using BlockPlacementPolicy? We could give additional 
state to BlockPlacementPolicy (whether new block allocation existing). From a 
quick look, the client and NN will use any extra replicas returned by 
BlockPlacementPolicy.
This looks promising and we will investigate this.  Thanks Arpit!

> Write resiliency for replica count 1
> ------------------------------------
>
>                 Key: HDFS-5434
>                 URL: https://issues.apache.org/jira/browse/HDFS-5434
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.2.0
>            Reporter: Buddy
>            Priority: Minor
>
> If a file has a replica count of one, the HDFS client is exposed to write 
> failures if the data node fails during a write. With a pipeline of size of 
> one, no recovery is possible if the sole data node dies.
> A simple fix is to force a minimum pipeline size of 2, while leaving the 
> replication count as 1. The implementation for this is fairly non-invasive.
> Although the replica count is one, the block will be written to two data 
> nodes instead of one. If one of the data nodes fails during the write, normal 
> pipeline recovery will ensure that the write succeeds to the surviving data 
> node.
> The existing code in the name node will prune the extra replica when it 
> receives the block received reports for the finalized block from both data 
> nodes. This results in the intended replica count of one for the block.
> This behavior should be controlled by a configuration option such as 
> {{dfs.namenode.minPipelineSize}}.
> This behavior can be implemented in {{FSNameSystem.getAdditionalBlock()}} by 
> ensuring that the pipeline size passed to 
> {{BlockPlacementPolicy.chooseTarget()}} in the replication parameter is:
> {code}
> max(replication, ${dfs.namenode.minPipelineSize})
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to