[ 
https://issues.apache.org/jira/browse/TEZ-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1277:
-------------------------
    Description: 
Occasionally tasks fail due to full disks because the disks had space when the 
task was allocating via LocalDirAllocator, but the disk space was actually 
promised to many tasks instead of just one.

This race condition shows up when a 1Gb spill can be done in ~10s or so.

There is no way to do this via the hadoop-fs abstraction - but an SSD based 
spill wastes most of the IOPS on journal updates about the file length changing.

  was:
Occasionally tasks fail due to full disks because the disks had space when the 
task was allocating via LocalDirAllocator, but the disk space was actually 
promised to many tasks instead of just one.

This race condition shows up when a 1Gb spill can be done in ~10s or so.


> Tez Spill handler should truncate files to reserve space on disk
> ----------------------------------------------------------------
>
>                 Key: TEZ-1277
>                 URL: https://issues.apache.org/jira/browse/TEZ-1277
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.5.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>
> Occasionally tasks fail due to full disks because the disks had space when 
> the task was allocating via LocalDirAllocator, but the disk space was 
> actually promised to many tasks instead of just one.
> This race condition shows up when a 1Gb spill can be done in ~10s or so.
> There is no way to do this via the hadoop-fs abstraction - but an SSD based 
> spill wastes most of the IOPS on journal updates about the file length 
> changing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to