[
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518314#comment-16518314
]
Daniel Templeton commented on HDFS-13448:
-----------------------------------------
The build failures are unrelated:
{quote}[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-antrun-plugin:1.7:run (common-test-bats-driver)
on project hadoop-common: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...<exec failonerror="true" dir="src/test/scripts"
executable="bash">... @ 4:69 in
/testptch/hadoop/hadoop-common-project/hadoop-common/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please
read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :hadoop-common{quote}
It's also really hard to find that output. It took some reverse engineering.
Someone should look into that...
The unit test failures are worth looking at. I recognize at least one of them
as flaky, but I can't assert they all are flaky. (Maybe someone else could.)
Aside from the unit tests, the patch looks good to me. I haven't done a formal
final pass, but I think you got it. [~daryn], wanna take another look?
Thanks for sticking through it, [~belugabehr].
> HDFS Block Placement - Ignore Locality for First Block Replica
> --------------------------------------------------------------
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: block placement, hdfs-client
> Affects Versions: 2.9.0, 3.0.1
> Reporter: BELUGA BEHR
> Assignee: BELUGA BEHR
> Priority: Minor
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch,
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.6.patch,
> HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
> * The replica placement strategy is that if the writer is on a datanode,
> * the 1st replica is placed on the local machine,
> * otherwise a random datanode. The 2nd replica is placed on a datanode
> * that is on a different rack. The 3rd replica is placed on a datanode
> * which is on a different node of the rack as the second replica.
> */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement
> request to not put a block replica on the local datanode _where 'local' means
> the same host as the client is being run on._
> {quote}
> /**
> * Advise that a block replica NOT be written to the local DataNode where
> * 'local' means the same host as the client is being run on.
> *
> * @see CreateFlag#NO_LOCAL_WRITE
> */
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that
> the first block replica be placed on a random DataNode in the cluster. The
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block
> replica is not placed on the local node, but it is still placed on the local
> rack. Where this comes into play is where you have, for example, a flume
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode
> local to the Flume agent will always get the first block replica and this
> leads to un-even block placements, with the local node always filling up
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then
> the default block placement policy will still prefer the local rack. This
> remedies the situation only so far as now the first block replica will always
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks
> randomly, evenly, over the entire cluster instead of hot-spotting the local
> node or the local rack.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]