[
https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran resolved HADOOP-10037.
-------------------------------------
Resolution: Cannot Reproduce
Fix Version/s: 2.6.0
closing as Cannot Reproduce, as it appears to have gone away for you.
# Hadoop 2.6 is using a much later version of jets3t
# Hadoop 2.6 also offers a (compatible) s3a fiesystem which uses the AWS SDK
instead.
If you do see this problem, try using s3a to see if it occurs there
> s3n read truncated, but doesn't throw exception
> ------------------------------------------------
>
> Key: HADOOP-10037
> URL: https://issues.apache.org/jira/browse/HADOOP-10037
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.0.0-alpha
> Environment: Ubuntu Linux 13.04 running on Amazon EC2 (cc2.8xlarge)
> Reporter: David Rosenstrauch
> Fix For: 2.6.0
>
> Attachments: S3ReadFailedOnTruncation.html, S3ReadSucceeded.html
>
>
> For months now we've been finding that we've been experiencing frequent data
> truncation issues when reading from S3 using the s3n:// protocol. I finally
> was able to gather some debugging output on the issue in a job I ran last
> night, and so can finally file a bug report.
> The job I ran last night was on a 16-node cluster (all of them AWS EC2
> cc2.8xlarge machines, running Ubuntu 13.04 and Cloudera CDH4.3.0). The job
> was a Hadoop streaming job, which reads through a large number (i.e.,
> ~55,000) of files on S3, each of them approximately 300K bytes in size.
> All of the files contain 46 columns of data in each record. But I added in
> an extra check in my mapper code to count and verify the number of columns in
> every record - throwing an error and crashing the map task if the column
> count is wrong.
> If you look in the attached task logs, you'll see 2 attempts on the same
> task. The first one fails due to data truncated (i.e., my job intentionally
> fails the map task due to the current record failing the column count check).
> The task then gets retried on a different machine and runs to a succesful
> completion.
> You can see further evidence of the truncation further down in the task logs,
> where it displays the count of the records read: the failed task says 32953
> records read, while the successful task says 63133.
> Any idea what the problem might be here and/or how to work around it? This
> issue is a very common occurrence on our clusters. E.g., in the job I ran
> last night before I had gone to bed I had already encountered 8 such
> failuers, and the job was only 10% complete. (~25,000 out of ~250,000 tasks.)
> I realize that it's common for I/O errors to occur - possibly even frequently
> - in a large Hadoop job. But I would think that if an I/O failure (like a
> truncated read) did occur, that something in the underlying infrastructure
> code (i.e., either in NativeS3FileSystem or in jets3t) should detect the
> error and throw an IOException accordingly. It shouldn't be up to the
> calling code to detect such failures, IMO.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)