[
https://issues.apache.org/jira/browse/TEZ-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228365#comment-17228365
]
Rajesh Balamohan commented on TEZ-4246:
---------------------------------------
Can you share more details on this [~okumin] ? This should flip between 0 & 1
disks, it is quite possible that all data files are sent to a disk due to
flipping. But why would it create issues w.r.t to soft limit? (as it should
have same number of file.out and file.out.index).
> Avoid uneven local disk usage for spills
> ----------------------------------------
>
> Key: TEZ-4246
> URL: https://issues.apache.org/jira/browse/TEZ-4246
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.9.2, 0.10.0
> Reporter: okumin
> Priority: Major
>
> This ticket would help a task attempt avoid overusing a specific disk.
>
> I have observed PipelinedSorter repeat spilling a large amount of data to one
> of two disks.
> In case that NodeManager has just two disks, they are basically selected in a
> round-robin fashion completely.
> [https://github.com/apache/hadoop/blob/rel/release-3.1.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/LocalDirAllocator.java#L422-L439]
> Each iteration of a spill tries to create its data file and the index file,
> meaning that Tez is likely to put all data files on the same disk in such
> cases.
>
> This unfair usage is inconvenient especially when we use features with a soft
> limit like this.
> * https://issues.apache.org/jira/browse/TEZ-4112
>
> Index files are relatively small, and I'd say we can put a data file and its
> index file in the same directory so that the round-robin doesn't skip any
> disks for such small usage.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)