[jira] [Updated] (HADOOP-9577) Actual data loss using s3n (against US Standard region)

david duncan (JIRA) Thu, 03 Oct 2013 20:45:50 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


david duncan updated HADOOP-9577:
---------------------------------

    Description: 
 The implementation of needsTaskCommit() assumes that the FileSystem used for 
writing temporary outputs is consistent.  That happens not to be the case when 
using the S3 native filesystem in the US Standard region.  It is actually quite 
common in larger jobs for the exists() call to return false even if the task 
attempt wrote output minutes earlier, which essentially cancels the commit 
operation with no error.  That's real life data loss right there, folks.

The saddest part is that the Hadoop APIs do not seem to provide any legitimate 
means for the various RecordWriters to communicate with the OutputCommitter.  
In my projects I have created a static map of semaphores keyed by 
TaskAttemptID, which all my custom RecordWriters have to be aware of.  That's 
pretty lame.

  was:
The implementation of needsTaskCommit() assumes that the FileSystem used for 
writing temporary outputs is consistent.  That happens not to be the case when 
using the S3 native filesystem in the US Standard region.  It is actually quite 
common in larger jobs for the exists() call to return false even if the task 
attempt wrote output minutes earlier, which essentially cancels the commit 
operation with no error.  That's real life data loss right there, folks.

The saddest part is that the Hadoop APIs do not seem to provide any legitimate 
means for the various RecordWriters to communicate with the OutputCommitter.  
In my projects I have created a static map of semaphores keyed by 
TaskAttemptID, which all my custom RecordWriters have to be aware of.  That's 
pretty lame.


> Actual data loss using s3n (against US Standard region)
> -------------------------------------------------------
>
>                 Key: HADOOP-9577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9577
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 1.0.3
>            Reporter: Joshua Caplan
>            Priority: Critical
>
>  The implementation of needsTaskCommit() assumes that the FileSystem used for 
> writing temporary outputs is consistent.  That happens not to be the case 
> when using the S3 native filesystem in the US Standard region.  It is 
> actually quite common in larger jobs for the exists() call to return false 
> even if the task attempt wrote output minutes earlier, which essentially 
> cancels the commit operation with no error.  That's real life data loss right 
> there, folks.
> The saddest part is that the Hadoop APIs do not seem to provide any 
> legitimate means for the various RecordWriters to communicate with the 
> OutputCommitter.  In my projects I have created a static map of semaphores 
> keyed by TaskAttemptID, which all my custom RecordWriters have to be aware 
> of.  That's pretty lame.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-9577) Actual data loss using s3n (against US Standard region)

Reply via email to