[
https://issues.apache.org/jira/browse/TEZ-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228999#comment-17228999
]
okumin commented on TEZ-4246:
-----------------------------
[~rajesh.balamohan] Thanks for your response. Let me describe my background
more.
I observed metrics of disk usage spike up and fail-fast kicked while I was
testing massive jobs. I also found all spill files were written on the same
disk.
{code:java}
[INFO] [TezChild] |impl.PipelinedSorter|: Reducer 4: Spilling to
/data/1/..../attempt_1601427140350_2834458_1_01_000283_0_10014_0/file.out
[INFO] [TezChild] |impl.PipelinedSorter|: Reducer 4: Spilling to
/data/1/..../attempt_1601427140350_2834458_1_01_000283_0_10014_1/file.out
...
[INFO] [TezChild] |impl.PipelinedSorter|: Reducer 4: Spilling to
/data/1/..../attempt_1601427140350_2834458_1_01_000283_0_10014_48/file.out
[INFO] [TezChild] |impl.PipelinedSorter|: Reducer 4: Spilling to
/data/1/..../attempt_1601427140350_2834458_1_01_000283_0_10014_49/file.out{code}
As to the soft limit, the TEZ-4112 I'm testing provides a feature to kill a job
based on its disk usage. The limit is configured per job per disk. Skew will
make it easy to hit the limit since the configured limit is invisible for
LocalDirAllocator and it doesn't stop using a certain disk until the disk gets
full.
I opened a WIP PR to let you know what I intend.
[https://github.com/apache/tez/pull/79]
Thanks.
> Avoid uneven local disk usage for spills
> ----------------------------------------
>
> Key: TEZ-4246
> URL: https://issues.apache.org/jira/browse/TEZ-4246
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.9.2, 0.10.0
> Reporter: okumin
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This ticket would help a task attempt avoid overusing a specific disk.
>
> I have observed PipelinedSorter repeat spilling a large amount of data to one
> of two disks.
> In case that NodeManager has just two disks, they are basically selected in a
> round-robin fashion completely.
> [https://github.com/apache/hadoop/blob/rel/release-3.1.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/LocalDirAllocator.java#L422-L439]
> Each iteration of a spill tries to create its data file and the index file,
> meaning that Tez is likely to put all data files on the same disk in such
> cases.
>
> This unfair usage is inconvenient especially when we use features with a soft
> limit like this.
> * https://issues.apache.org/jira/browse/TEZ-4112
>
> Index files are relatively small, and I'd say we can put a data file and its
> index file in the same directory so that the round-robin doesn't skip any
> disks for such small usage.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)