[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470559#comment-16470559
 ] 

Daryn Sharp commented on HDFS-13448:
------------------------------------

Generally looks good!

I'd suggest renaming {{NO_LOCALITY_WRITE}} to disambiguate its meaning. 
{{AVOID_RACK_WRITE}} perfectly conveys that since it may randomly chose the 
local rack. However, maybe {{NO_RACK_WRITE}} is more consistent with the 
existing {{NO_LOCAL_WRITE}}.

(Aside: I think {{NO_LOCAL_WRITE}} latter should have been 
{{AVOID_LOCAL_WRITE}} since it too may randomly fallback to the node which 
violates what "no" means. Might consider adding {{AVOID_LOCAL_WRITE}} with same 
enum value and deprecate {{NO_LOCAL_WRITE}} but if you feel that's beyond the 
scope I'm not that concerned.)

The test doesn't appear to actually test this feature. Ex. 5 racks, 1 node 
each, add block 20 times, verify 2 or more distinct racks. That's testing the 
existing {{NO_LOCAL_WRITE}} feature, not this patch, and it doesn't test random 
fallback to the local rack if no other location is available. You need more 
than 1 node per rack to test.

I'd do something like 3 racks, 1 rack with 4 nodes, 2 racks with 1 node. Use 
repl factor 1. Using a client machine on rack1, verify the new flag returns 
locations on rack2/rack3. Mark nodes on rack2/rack3 as decommed. Verify it 
fallbacks to a location on rack1.

> HDFS Block Placement - Ignore Locality for First Block Replica
> --------------------------------------------------------------
>
>                 Key: HDFS-13448
>                 URL: https://issues.apache.org/jira/browse/HDFS-13448
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: block placement, hdfs-client
>    Affects Versions: 2.9.0, 3.0.1
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Minor
>         Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch, HDFS-13448.5.patch, HDFS-13448.6.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>    * Advise that a block replica NOT be written to the local DataNode where
>    * 'local' means the same host as the client is being run on.
>    *
>    * @see CreateFlag#NO_LOCAL_WRITE
>    */
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to