[
https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526850#comment-16526850
]
Jim Brennan commented on HADOOP-15548:
--------------------------------------
[~eepayne] thanks for the review! I've uploaded a new patch that adds a check
to ensure we are not always selecting the next dir, which is what it used to do.
> Randomize local dirs
> --------------------
>
> Key: HADOOP-15548
> URL: https://issues.apache.org/jira/browse/HADOOP-15548
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Jim Brennan
> Assignee: Jim Brennan
> Priority: Minor
> Attachments: HADOOP-15548.001.patch, HADOOP-15548.002.patch
>
>
> shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container.
> Some applications will process these in exactly the same way in every
> container (e.g. roundrobin) which can cause disks to get unnecessarily
> overloaded (e.g. one output file written to first entry specified in the
> environment variable).
> There are two paths for local dir allocation, depending on whether the size
> is unknown or known. The unknown path already uses a random algorithm. The
> known path initializes with a random starting point, and then goes
> round-robin after that. When selecting a dir, it increments the last used by
> one and then checks sequentially until it finds a dir that satisfies the
> request. Proposal is to increment by a random value of between 1 and
> num_dirs - 1, and then check sequentially from there. This should result in
> a more random selection in all cases.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]