[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572328#comment-16572328
 ] 

Virajith Jalaparti edited comment on HDFS-13088 at 8/7/18 9:00 PM:
-------------------------------------------------------------------

Thanks for the feedback [~elgoiri] and [~ehiggs].

 [^HDFS-13088.002.patch]  is an alternate approach to implement this -- It adds 
a new parameter {{dfs.provided.overreplication.factor}} which allows specifying 
how many extra replicas can be allowed for blocks that are PROVIDED. This is a 
single value for all blocks/files in the system and ephemeral (not necessarily 
retained across Namenode restarts unless the config value remains the same). 
However, there are no changes to {{FileSystem}} or {{INodeFile}} and much less 
intrusive.

The main change to existing code is when the excess replicas are checked for in 
{{BlockManager#shouldProcessExtraRedundancy}} -- the number of excess replicas 
are determined to be the block replication + the value specified by 
{{dfs.provided.overreplication.factor}} for PROVIDED blocks. For blocks that 
are not PROVIDED or for EC-blocks, the earlier semantics are retained.

I still need to add tests for this but posting the patch to get it out earlier.


was (Author: virajith):
Thanks for the feedback [~elgoiri] and [~ehiggs].

 [^HDFS-13088.002.patch]  is an alternate approach to implement this -- It adds 
a new parameter {{dfs.provided.overreplication.factor}} which allows specifying 
how many extra replicas can be allowed for blocks that are PROVIDED. This is a 
single value for all blocks/files in the system and ephemeral (not necessarily 
retained across Namenode restarts unless the config value remains the same). 
However, there are no changes to {{FileSystem}} or {{INodeFile}} and much less 
intrusive.

> Allow HDFS files/blocks to be over-replicated.
> ----------------------------------------------
>
>                 Key: HDFS-13088
>                 URL: https://issues.apache.org/jira/browse/HDFS-13088
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Virajith Jalaparti
>            Assignee: Virajith Jalaparti
>            Priority: Major
>         Attachments: HDFS-13088.001.patch, HDFS-13088.002.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to